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CASE STUDY: Kroll Fectuel Data 


Kroll Factual Data of Loveland, Colorado, is a longtime provider of 
information services to the mortgage industry. The firm wanted to 
optimize its server infrastructure to better meet spikes in demand and 
reduce data center costs. Kroll Factual Data virtualized its data center 
using Windows Server® 2008 and Hyper-V™ technology, consolidating 
650 servers to 22. It further streamlined its infrastructure using 
Microsoft® System Center data center solutions to monitor and manage 
its physical and virtual landscape, and Microsoft Visual Studio” 
development tools to quickly develop applications. 


With its new optimized infrastructure, the company can grow faster, 
scale quickly to meet customer needs and dramatically reduce IT costs. 
Kroll Factual Data has cut annual hardware expenditures by tens of 
thousands of dollars, and energy costs by U.S. $442,554 annually. 


= asq| To download the case study, 
snap this tag or text VIRTUAL to 21710* 
N Get the free app for your phone at http://gettag.mobi 


*Standard messaging and data charges apply. 


To read the full case study, visit 
itseverybodysbusiness.com/virtual 
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Editor’s Tip 


We're resurfacing our most 

popular articles in the SQL 

Server classics column in the 

July issue. Which SQL Mag articles are your 

favorites? Let me know at mkeller@ sqlmag.com. 
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Readers Weigh In on 
Microsoft’s Support for 
Small Businesses 


our responses to “Is Microsoft Leaving Small 

Businesses Behind?” April 2010, InstantDoc 
ID 103615, indicate that some of you have strong 
feelings that Microsoft’s focus on the enterprise has 
indeed had the unwanted effect of leaving the small 
business sector behind. At least that’s the perception. 
Readers have noted the high cost of the enterprise 
products and the enterprise-oriented feature sets in 
products intended for the small business market. 
Some readers also lamented the loss of simplicity 
and ease of use that Microsoft products have been 
known for. In this column ГЇЇ share insight from 
readers about what they perceive as Microsoft’s 
growing distance from the needs and wants of the 
small business community. 


Is the Price Point a Fit for Small 
Businesses? 

Not surprisingly, in this still lean economy, several 
readers noted that the price increases of Microsoft 
products make it more difficult for small businesses 
to continue to purchase product upgrades. Dean 
Zimmer noted, “The increase in cost and complexity, 
and decrease in small business focus has been quite 
noticeable the last 5 years. We will not be upgrading to 
VS2010 [Visual Studio 2010], we will stop at VS2008 
and look for alternatives." Likewise, Kurt Survance felt 
there was a big impact on pricing for smaller customers. 
“The impact of the new SQL Server pricing is heaviest 
on small business, but the additional revenue seems to 
be targeted for features and editions benefitting the 
enterprise client. SQL 2008 R2 is a case in point. If you 
are not seduced by the buzz about BI for the masses, 
there is little in R2 of much worth except the two new 
premium editions and some enterprise management 
features useful only to very large installations." 


Do Small Businesses Need a Simpler 
Offering? 

Price, while important, was only part of the equation. 
Increasing complexity was also a concern. David 
Dorvak lamented the demise of the simpler Data 
Transformation Services (DTS) product in favor 
of the enterprise-oriented SQL Server Integration 
Services (SSIS). *With Integration Services Microsoft 
completely turned its back on those of us who value 
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and appreciate the simplicity and ease of use that SOL 
Server used to provide. I understand that Integration 
Services can provide very robust functionality if 
used properly, but it is definitely at the expense of 
simplicity." 


Do Small Businesses Want a Scaled- 
Down Enterprise Offering? 

Perhaps most outstanding is the feeling that Microsoft 
has lost its small business roots in its quest to be 
an enterprise player. Andrew Whittington pointed 
out “We often end up wondering how much longer 
Microsoft can continue on this path of taking away 
what customers want, replacing it with what Microsoft 
_thinks_ they want!” Doug Thompson agreed that as 
Microsoft gets larger it has become more IBM-like. 
“Microsoft is vying to be in the same space as its old 
foe IBM—31f it could sell mainframes it would.” 

Dorvak also questioned whether Microsoft might 
be on the wrong track with its single-minded enterprise 
focus. “For every AT&T, there аге 10's, 100% or 1000s 
of companies our size. Microsoft had better think care- 
fully about where it goes in the future.” 

In comments to the editorial online, Chipman 
observed that Microsoft’s attention on the enterprise 
is leaving the door open at the low-end of the market, 
“Its the 80/20 rule where the 20% of the small 
businesses/departments are neglected in favor of the 
80% representing IT departments or the enterprise, 
who make the big purchasing decisions. This short- 
sightedness opens the way for the competition, 
such as MySQL, who are doing nothing more than 
taking a leaf out of Microsoft’s original playbook by 
offering more affordable, easier to use solutions for 
common business problems. It will be interesting to 
see how it all plays out.” 


Two Sides to Every Story 
Га like to thank everyone for their comments. I didn't 
get responses from readers who felt warm and fuzzy 
about the way Microsoft is embracing small businesses. 
If you have some thoughts about Microsoft and small 
business, particularly if you're in a small business and 
you think Microsoft is doing a great job, tell us about it 
at motey@sqlmag.com or letters@sqlmag.com. В 
InstantDoc ID 125116 


Michael Otey 


(motey@ sqlmag.com) is technical director 
for Windows IT Pro and SQL Server 
Magazine and author of Microsoft SQL Server 
2008 New Features (Osborne/McGraw-Hill). 
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NEW FROM CONFIO 


Ignite 8 is a comprehensive SOL Server 
performance solution. Ignite focuses 
on measuring response time, and adds 
server resources, expert advice, and 
historical trends ranging from one 


second to one year. 


» QUERY BOTTLENECKS 


Answer why applications wait 
on SQL Server. 


» EXECUTION PLAN CHANGES 
Detect performance impact 
on applications. 

» RESOURCE HOG QUERIES 


Find where server resources 
are wasted. 


Confio Software, 
Boulder Colorado. 


CONFIO* 


Visit Confio.com and download a free trial so hot, it requires adequate ventilation. 


Reporting on Non-Existent Data 


Tracks Transaction-Less Dates” (November 2009, 

InstantDoc ID 102744), Saravanan Radhakrisnan 
presented a challenging task that I refer to as reporting 
on non-existent data. He had to write a T-SQL query 
that would determine which stores didn’t have any 
transactions during a one-week period, but the table 
being queried included only the stores’ IDs and the 
dates on which each store had a transaction. Listing 1 
shows his solution. Although this query works, it has 
some shortcomings: 

1. As Radhakrisnan mentions, if none of the 
stores have a transaction on a certain day, the query 
won't return any results for that particular day for 
all the stores. So, for example, if all the stores were 
closed on a national holiday and therefore didn’t have 
any sales, that day won't appear in the results. 

2. If a store doesn't have any sales for all the 
days in the specified period, that store won't appear 
in the results. 

3. The query uses T-SQL functionality that isn't 
recommended because of the poor performance it 
can cause. Specifically, the query uses three derived 
queries with the DISTINCT clause and the NOT IN 
construction. You can encounter performance prob- 
lems when derived tables get too large to use indexes 
for query optimization. 


[| n the Reader to Reader article “T-SQL Statement 


Td like to call attention to several different ways to 
work around these shortcomings. 


How to Include All the Days 

If you want the results to include all the days in 
the reporting period, even those days without any 
transactions, you can use the code in Listing 2. 
Here’s how this code works. After creating the dbo 
.MySalesTable2 table, the code populates it with data 
that has a “hole” in the sales date range. (This is the 
same data that Radhakrisnan used for his query, 
except that the transaction for October 03, 2009, 
isn’t inserted.) Then, the query in callout A runs. The 
first statement in callout A declares the @BofWeek 
variable, which defines the first day of the reporting 
period (in this case, a week). The query uses the 
@BofWeek variable as a base in the table constructor 
clause to generate the seven sequential days needed for 
the reporting period. 

In addition to including all the days, the revamped 
query in Listing 2 performs better than the original 
query in Listing 1 because it minimizes the use of 
T-SQL functionality that’s not recommended. As you 
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can see, the revamped query uses only one derived 
query with the DISTINCT clause instead of three. 
Plus, the NOT IN construction is replaced with a 
LEFT OUTER JOIN and a WHERE clause. In my 
testing environment, the revamped query was more 
than 15 percent faster than the original query. 


How to Include Every Store 

If you want the results to include all the stores, even 
those stores without any transactions during the 
reporting period, you need to change the query's 
internal logic. Instead of using transactional data (in 
OLTP systems) or a fact table (in data warehouse/ 
OLAP systems) as a source for obtaining the list of 
stores, you need to introduce a look-up table (OLTP) 
or dimension table (data warehouse/OLAP). Then, to 
get the list of stores, you replace 


Gennadiy 
Chornenkyy 


(SELECT DISTINCT Store ID 
FROM dbo.MySalesTable2) st 


ms MORE on the WEB 


Download the code at 
InstantDoc ID 125130. 


(SELECT Store_ID 
FROM dbo.MyStoresTable) st 


in the revamped query. 

The code in Listing 3 shows this solution. It creates 
a new table named dbo.MyStoresTable. This table is 
populated with the five stores in dbo.MySalesTable2 
(stores with IDs 100, 200, 300, 400, and 500) and adds 
two new stores (stores with IDs 600 and 700) that don’t 
have any transactions. If you run the code in Listing 3, 
you'll see that the results include all seven stores, even 
though stores 600 and 700 don’t have any transactions 
during the reporting period. 


How to Optimize Performance 

Sometimes when you have a large number of rows (e.g., 
5,000 to 10,000 rows) in the derived tables and large 
fact tables, implementing indexed temporary tables 
can increase query performance. I created a version of 


LISTING 1: The Original Query 


SELECT stl.Store ID, st2.NoTransactionDate 
FROM (SELECT DISTINCT Store ID FROM MySalesTable (NOLOCK)) AS 5+1, 
(SELECT DISTINCT TransactionDate AS NoTransactionDate 
FROM MySalesTable (NOLOCK)) AS st2 
WHERE st2.NoTransactionDate NOT IN 
(SELECT DISTINCT st3.TransactionDate FROM MySalesTable st3 
(NOLOCK) WHERE st3.store id = stl.store id) 
ORDER BY sti.store id, st2.NoTransactionDate 
GO 


June 2010 7 


LISTING 2: Query That Includes All Days 


CREATE TABLE dbo.MySalesTable2 CREATE TABLE [dbo]. [MyStoresTable] ( 
( Store_ID INT, TransactionDate SMALLDATETIME ) [Store ID] [int] NOT NULL, 
Go CONSTRAINT [PK MyStores] PRIMARY KEY CLUSTERED ([Store ID] ASC) ) 
INSERT INTO dbo.MyStoresTable (Store ID) 
VALUES (109), (200) , (300) , (400), (500) , (600) , (700) 


LISTING 3: Query That Includes All Stores 


-- Populate the table. 

INSERT INTO dbo.MySalesTable2 
SELECT 100, 2089-10-05’ UNION DECLARE @BofWeek datetime = '2009-10-01 00:00:00" 
SELECT 200, '2009-10-05' UNION SELECT st2.Store ID, st2.Day of Week 

SELECT 200, '2009-10-06' UNION FROM 

SELECT 300, '2009-10-01' UNION (SELECT st.Store ID, DATES.Day of Week 
SELECT 300, '2009-10-07' UNION FROM ( 

SELECT 400, '2009-10-04' UNION VALUES 


SELECT 400, '2009-10-06' UNION (CONVERT(varchar(35) ,@BofWeek ,101)), 

SELECT 500, '2009-10-01' UNION (CONVERT (varchar (35) , dateadd (DD, 1, GBofWeek) , 101) ) , 
SELECT 500, '2009-10-02' UNION Р (CONVERT (varchar (35) ,dateadd(DD,2,@BofWeek) ,191)) , 
-- Transaction for October 93, 2009, not inserted. (CONVERT (varchar (35) , dateadd (DD, 3,@BofWeek) , 101)) , 
-- SELECT 500, '2009-10-03' UNION (CONVERT (varchar (35) , dateadd (DD, 4, @BofWeek) , 101)) , 
SELECT 500, '2009-10-04' UNION (CONVERT (varchar (35) , dateadd (DD, 5, GBofWeek) , 101) ) , 


SELECT 500, '2009-10-05' UNION (CONVERT (varchar (35) , dateadd (DD, 6, GBofWeek) , 101) ) 
SELECT 500, '2009-10-06' UNION ) DATES (Day. of. Week) 


SELECT 500, '2009-10-07' CROSS JOIN 
GO (SELECT Store ID FROM dbo.MyStoresTable ) st 
JAS st2 


DECLARE @BofWeek datetime = '2009-10-01 00:00:00" 


LEFT JOIN dbo.MySalesTable2 st3 
SELECT st2.Store ID, st2.Day of Week 


ON st3.Store ID = st2.Store ID AND 


FROM st3.TransactionDate = st2.Day of Week 
(SELECT st.Store ID, DATES.Day of Week WHERE st3.TransactionDate IS NULL 
FROM С ORDER BY st2.Store ID, st2.Day of Week 
VALUES GO 


(CONVERT (varchar (35) ,@BofWeek ,101)), 
(CONVERT (varchar (35) , dateadd (DD, 1, 


GBofWeek),101)), 


(CONVERT (varchar (35) , dateadd (DD,2, 


GBofWeek),101)), 


CCONVERT (varchar (35) ,dateadd(DD, 3, 


GBofWeek),101)), 


(CONVERT (varchar (35) , dateadd (DD,4, 


GBofWeek),101)), 


(CONVERT (varchar (35) , dateadd (DD,5, 


GBofWeek),101)), 


(CONVERT (varchar (35) , dateadd (DD,6, 


@BofWeek) ,101)) 
) DATES (Day. of. Week) 
CROSS JOIN 


(SELECT DISTINCT Store ID FROM dbo 
.MySalesTable2 ) st 


) AS st2 


LEFT JOIN dbo.MySalesTable2 st3 
ON st3.Store ID - st2.Store ID AND 
st3.TransactionDate - st2.Day of Week 
WHERE st3.TransactionDate IS NULL 
ORDER BY st2.Store ID, st2.Day of Week 


You need to be cautious with solutions that use 
indexed temporary tables. A solution might work well 
in one environment but timeout in another, killing 
your application. For example, I initially tested Query- 
UsingIndexedTemporaryTables.sql using a temporary 
table with 15,000 rows. When I changed number of 
rows for the temporary table to 16,000, the query's 
response time increased more than four times—from 
120ms to 510ms. So, you need to know your produc- 
tion system workload types, SQL instance configu- 
ration, and hardware limitations if you plan to use 
indexed temporary tables. 

Another way to optimize the performance of 
queries is to use the EXCEPT and INTERSECT 


GO operators, which were introduced in SQL Server 2005. 
These set-based operators can increase efficiency when 
you need to work with large data sets. 

the revamped query (QueryUsingIndexedTemporary- I created a version of the revamped query (Query- 
Tables.sql) that uses an indexed temporary table. This | UsingEXCEPTOperator.sgl) that uses the EXCEPT 
code uses Radhakrisnan’s original data (i.e, data that operator. Once again, this code uses Radhakrisnan’s 
includes the October 03, 2009, transaction), which was original data. QueryUsingEXCEPTOperator.sql pro- 
created with MySalesTable.Table.sq]. vides the fastest and most stable performance. It ran 
Like the queries in Listings 2 and 3, QueryUsing- five times faster than Radhakrisnan’s original query. 
IndexedTemporaryTables.sql creates the @BofWeek (А table with a million rows was used for the tests.) 
variable, which defines the first day of the reporting You can download the solutions I discussed (as well 
period. Next, it uses the CREATE TABLE command as MySalesTable.Table.sql) from the SQL Server Maga- 
to create the #StoreDate local temporary table, which zine website. I’ve provided two versions of the code. The 
has two columns: Store ID and Transaction Date. first set of listings is compatible with SQL Server 2008. 
Using the INSERT INTO...SELECT clause, the code (These are the listings you see here.) The second set can 
populates the #StoreDate temporary table with all pos- Бе executed in a SQL Server 2005 environment. ЕШ 
sible Store ID and Transaction Date combinations. —Gennadiy Chornenkyy, 
Finally, the code uses a CREATE INDEX statement to data architect, ADP Canada 
create an index for the #StoreDate temporary table. InstantDoc ID 125130 
8 June 2010 SQL Server Magazine * www.sqlmag.com 


Smarter technology for a Smarter Planet: 


The tools of collaboration 
vs the tools of distraction. 


How do you enable a planet with almost a billion mobile workers to instantly access the people and 
information needed to get jobs done? Providing a single, integrated platform for all the communication 
and collaboration tools is a good first step, but on a smarter planet, companies must adapt to the way 
people work. IBM's open approach is designed to help meet your company's specific demands — both 
now and in the future. And with real-time presence awareness and simple, consistent interfaces across 
multiple devices, IBM empowers people to better collaborate with customers, suppliers and partners — 
from the office to home, from the factory floor to the road, and beyond. 


A smarter business needs smarter software, systems and services. 
Let's build a smarter planet. ibm.com/communicate 
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SP WholsActive 


Get detailed information about the sessions 
running on your SQL Server system 


o say I like SP WholsActive is an understate- 

ment. This is probably the most useful and effec- 
tive stored procedure I’ve ever encountered for activity 
monitoring. The purpose of the SP WholsActive 
stored procedure is to give DBAs and developers as 
much performance and workload data about SQL 
Server's internal workings as possible, while retaining 
both flexibility and security. It was written by Boston- 
area consultant and writer Adam Machanic, who 
is also a long-time SQL Server MVP, a founder of 
SQLBlog.com, and one of the most elite individuals 
who are qualified to teach the Microsoft Certified 
Master classes. 

Adam, who has exhaustive knowledge of SQL 
Server internals, knew that he could get more detailed 
information about SQL Server performance than 
what was offered natively through default stored 
procedures, such as SP WHO2 and SP. LOCK, and 
SQL Server Management Studio (SSMS). Therefore, 
he wrote the SP WholsActive stored procedure to 
quickly retrieve information about users’ sessions 
and activities. Let's look at SP. WholIsActive's most 
important features. 


Key Parameters 
SP WholsActive does almost everything you'd 
expect from an activity-monitoring stored proce- 
dure, such as displaying active SPIDs and transac- 
tions and locking and blocking, but it also does a 
variety of things that you aren't typically able to do 
unless you buy a commercial activity-monitoring 
solution. One key feature of the script is flex- 
ibility, so you can enable or disable (or even specify 
different levels of information for) any of the fol- 
lowing parameters: 

* Online help is available by setting the parameter 
@help = 1, which enables the procedure to return 
commentary and details regarding all of the input 
parameters and output column names. 

* Aggregated wait stats, showing the number of 
each kind of wait and the minimum, maximum, 
and average wait times are controlled using the 
(Qget task info parameter with input values of 
0 (don't collect), the default of 1 (lightweight 
collection mode), and 2 (collect all current waits, 
with the minimum, maximum, and average wait 
times). 
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* Query text is available that includes the statements 
that are currently running, and you can optionally 
include the outer batch by setting (Qget outer. 
command = 1. In addition, SP. WhoIsActive 
can pull the execution plan for the active session 
statement using the @get_plans parameter. 

* Deltas of numeric values between the last run and the 
current run of the script can be assigned using the 
(delta, interval = N (where N is seconds) parameter. 

* Filtered results are available on session, login, data- 
base, host, and other columns using simple wild- 
cards similar to the LIKE clause. You can filter 
to include or exclude values, as well as exclude 
sleeping SPIDs and system SPIDs so that you can 
focus on user sessions. 

* Transaction details, such as how many transaction 
log entries have been written for each database, are 
governed by the (Qget. transaction info parameter. 

* Blocks and locks are easily revealed using param- 
eters such as (a)find block leaders, which, when 
combined with sorting by the [blocked session . 
count] column, puts the lead blocking sessions at 
top. Locks are similarly revealed by setting the 
(get locks parameter. 

* Long-term data collection is facilitated via a set 
of features designed for data collection, such as 
defining schema for output or a destination table 
to hold the collected data. 


SP. WholsActive is the epitome of good T-SQL 
coding practices. I encourage you to spend a little time 
perusing the code. You'll note, from beginning to end, 
the strong internal documentation, intuitive and read- 
able naming of variables, and help-style comments 
describing all parameters and output columns. The 
procedure is completely safe against SQL injection 
attacks as well, since it parses input parameter values 
to a list of allowable and validated values. 


System Requirements 
Adam releases new versions of the stored procedure 
at regular intervals at http://tinyurl.com/WholsActive. 
SP. WholsActive requires SQL Server 2005 SP1 or 
later. Users of the stored procedure need VIEW 
SERVER STATE permissions, which can be granted 
via a certificate to minimize security issues. [SQL] 
InstantDoc ID 125107 
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(kevin.kline@ quest.com) is the director of 


technology for SQL Server Solutions at Quest 
Software and a founding board member of 
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SQL in а Nutshell, 3rd edition (O'Reilly). 


Qu: Note 


We want to hear your 
feedback on the Tool Time 
discussion forum at 
sqlforums.windowsitpro 
.com/web/forum/categories 
.aspx?catid= 169&entercat=y. 


SP_WholsActive 


BENEFITS: 

It provides detailed 
information about 

all of the sessions 
running on your 

SQL Server system, 
including what they're 
doing and how they're 
impacting server 
behavior. 


SYSTEM 
REQUIREMENTS: 
SQL Server 2005 
SP1 and later; users 
need VIEW SERVER 
STATE permissions 


HOW TO GET IT: 
You can download 
SP. WholsActive 
from sqlblog.com/ 
tags/Who-is--Active/ 
default.aspx. 
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What Happens if | Drop a 
Clustered Index? 


ve heard that the clustered index is “the data,” 
[| but I don’t fully understand what that means. If I 
drop a clustered index, will I lose the data? 
I get asked this question a lot, and I find that index 
structures tend to confuse people; indexes seem mys- 
terious and, as a result, are unintentionally thought of 
as very complicated. A table can be stored internally 
with or without a clustered index. If a table doesn’t 
have a clustered index, it’s called a heap. If the table has 
a clustered index, it’s often referred to as a clustered 
table. When a clustered index is created, SQL Server 
will temporarily duplicate and sort the data from the 
heap into the clustered index key order (because the 
key defines the ordering of the data) and remove 
the original pages associated with the heap. From this 
point forward, SQL Server will maintain order logi- 
cally through a doubly-linked list and a B+ tree that’s 
used to navigate to specific points within the data. 

In addition, a clustered index helps you quickly 
navigate to the data when queries make use of non- 
clustered indexes—the other main type of index SQL 
Server allows. A nonclustered index provides a way to 
efficiently look up data in the table using a different key 
from the clustered index key. For example, if you create 
a clustered index on EmployeeID in the Employee 
table, then the EmployeeID will be duplicated in each 
nonclustered index record and used for navigation 
from the nonclustered indexes to retrieve columns 
from the clustered index data row. (This process is often 
known as a bookmark lookup or a Key Lookup.) 


However, all of these things change if you drop the 
clustered index on a table. The data isn't removed, just 
the maintenance of order (1.е., the index/navigational 
component of the clustered index). However, non- 
clustered indexes use the clustering key to look up the 
corresponding row of data, so when a clustered index 
Is dropped, the nonclustered indexes must be modified 
to use another method to look up the corresponding 
data row because the clustering key no longer exists. 

The only way to jump directly to a record in the 
table without a clustered index is to use its physical 
location in the database (ie, a particular record 
number on a particular data page in a particular 
data file, known as a row identifier—RID), and 
this physical location must be included in the non- 
clustered indexes now that the table is no longer 
clustered. So when a clustered index is dropped, all 
the nonclustered indexes must be rebuilt to use RIDs 
to look up the corresponding row within the heap. 

Rebuilding all the nonclustered indexes on a 
table can be very expensive. And, if the clustered 
index is also enforcing a relational key (primary or 
unique), it might also have foreign key references. 
Before you can drop a primary key, you need to first 
remove all the referencing foreign keys. So although 
dropping a clustered index doesn't remove data, you 
still need to think very carefully before dropping it. 

I have a huge queue of index-related questions, 
so I’m sure indexing best practices will be a topic 
frequently covered in this column. ГЇЇ tackle another 
indexing question next to keep you on the right track. 


Changing the Definition of a 
Clustered Index 


columns on which I defined my clustered index) 

should be unique, narrow, static, and ever- 
increasing. However, my clustering key is on a GUID. 
Although a GUID is unique, static, and relatively 
narrow, Га like to change my clustering key, and 
therefore change my clustered index definition. How 
can I change the definition of a clustered index? 


Ё learned that my clustering key (i.e., the 


This question is much more complex than it seems, 
and the process you follow is going to depend on 
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whether the clustered index is enforcing a primary 
key constraint. In SQL Server 2000, the DROP_ 
EXISTING clause was added to let you change the 
definition of the clustered index without causing all 
the nonclustered indexes to be rebuilt twice. The first 
rebuild is because when you drop a clustered index, the 
table reverts to being a heap, so all the lookup refer- 
ences in the nonclustered indexes must be changed 
from the clustering key to the row identifier (RID), as 
Idescribed in the answer to the previous question. The 
second nonclustered index rebuild is because when 
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QUESTIONS 
ANSWERED 


you build the clustered index again, all nonclustered 
indexes must use the new clustering key. 

To reduce this obvious churn on the nonclustered 
indexes (along with the associated table locking 
and transaction log generation), SQL Server 2000 
included the DROP EXISTING clause so that 
the clustering key could be changed and the non- 
clustered indexes would need to be rebuilt only once 
(to use the new clustering key). 

However, the bad news is that the DROP - 
EXISTING clause can be used to change only 
indexes that aren't enforcing a primary key or unique 
key constraint (1.е., only indexes created using а 
CREATE INDEX statement). And, in many cases, 
when GUIDs are used as the primary key, the pri- 
mary key constraint definition might have been 
created without speci- 
fying the index type. 


Clustering on a key such as a GUID can result 
in a lot of fragmentation. However, the level of 
fragmentation also depends on how the GUIDs 
are being generated. Often, GUIDs are generated 
at the client or using a function (either the newid( ) 
function or the newsequentialid( ) function) at the 
server. Using the client or the newid( ) function to 
generate GUIDs creates random inserts in the struc- 
ture that’s now ordered by these GUIDs— because 
it’s the clustering key. As a result of the performance 
problems caused by the fragmentation, you might 
want to change your clustering key or even just 
change the function (if it’s server side). If the GUID 
is being generated using a DEFAULT constraint, 
then you might have the option to change the func- 
tion behind the constraint from the newid( ) function 

to the newsequentialid( ) 
function. Although the 


When the index type If you want to change newsequentialid( ) function 


isn’t specified, SQL 


doesn’t guarantee perfect 


Server defaults to cre- the definition of a contiguity or a gap-free 


ating a clustered index 


sequence, it generally cre- 


to enforce the primary clustered index, and the ates values greater than 


key. You can choose 
to enforce the primary 


clustered index is being 


any previously generated. 
(Note that there are cases 


key with a nonclus- used to enforce your when the base value that’s 


tered index by explic- 


used is regenerated. For 


itly stating the index table’s primary key, it’s example, if the server is 


type at definition, but 


restarted, a new starting 


the default index type NOt going to be a simple «aus which might be 


is a clustered index if 


lower than the current 


one doesn’t already Process. value, will be generated.) 


exist. (Note that if a 

clustered index already exists and the index 
type isn’t specified, SQL Server will still allow the 
primary key to be created; it will be enforced using 
a nonclustered index.) 


LISTING 1: Code to Generate the ALTER INDEX 
Statements 


SELECT 
DISABLE_STATEMENT = 
N'ALTER INDEX ' 
QUOTENAME(si.[name], N']') 


+ 


+ N' ON ' 

+ QUOTENAME (sch. [name], N']') 

+ №." 

+ QUOTENAME(OBJECT МАМЕ (so. [object_id]), М']') 
+ N' DISABLE' 


‚ ENABLE STATEMENT = 
N'ALTER INDEX ' 
+ QUOTENAME(si.[name], N']') 
N' ON ' 
QUOTENAME (sch. [name], N']') 
NS E 
QUOTENAME(OBJECT NAME(so.[object id]), N']') 
+ N' REBUILD" 
FROM sys.indexes AS si 
JOIN sys.objects AS so 
ON si.[object id] = so.[object id] 
JOIN sys.schemas AS sch 
ON so.[schema id] = sch.[schema id] 
WHERE si.[object id] = object id('tablename') 
AND si.[index id] » 1 


+++ 


du 
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Even with these excep- 
tions, the fragmentation within this clustered index will 
be drastically reduced. 

So if you still want to change the definition of 
the clustered index, and the clustered index is being 
used to enforce your table’s primary key, it's not going 
to be a simple process. And, this process should be 
done when users aren’t allowed to connect the data- 
base, otherwise data integrity problems can occur. 
Additionally, if you're changing the clustering key 
to use a different column(s), then you'll also need to 
remember to recreate your primary key to be enforced 
by a nonclustered index instead. Here's the process to 
follow to change the definition of a clustered index: 

1. Disable all the table's nonclustered indexes 
so that they aren't automatically rebuilt when the 
clustered index is dropped in step 3. Because this is 
likely to be a one-time operation, use the query in 
Listing 1 (with the desired table name) to generate 
the ALTER INDEX statements. Note that you 
should use the column for DISABLE, STATE- 
MENTS to disable the nonclustered indexes, and be 
sure to keep the enable information handy because 
you'll need it to rebuild the nonclustered indexes 
after you've created the new clustered index. 
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2. Disable any foreign key constraints. This is 
where you want to be careful if there are users using 
the database. In addition, this is also where you 
might want to use the following query to change the 
database to be restricted to only DBO use: 


ALTER DATABASE DatabaseName 
SET RESTRICTED_USER 
WITH ROLLBACK AFTER 5 


The ROLLBACK AFTER 7 clause at the 
end of the ALTER DATABASE statement lets 
you terminate user connections and put the 
database into a restricted state for modifications. 
As for automating the disabling of foreign key 
constraints, I leveraged some of the code from 
sp. fkeys and significantly altered it to generate 
the DISABLE command (similarly to how we did 
this in step 1 for disabling nonclustered indexes), 
which Listing 2 shows. Use the column for 
DISABLE STATEMENTS to disable the foreign 
key constraints, and keep the remaining informa- 
tion handy because you'll need it to reenable and 
recheck the data, as well as verify the foreign key 
constraints after you've recreated the primary key 
as a unique nonclustered index. 

3. Drop the constraint-based clustered index 
using the following query: 


ALTER TABLE schema.tablename 
DROP CONSTRAINT ConstraintName 


4. Create the new clustered index. The new 
clustered index can be constraint-based or a 
regular CREATE INDEX statement. However, 
the clustering key (the key definition that defines 
the clustered index) should be unique, narrow, 
static, and ever-increasing. And although we've 
started to discuss some aspects of how to choose 
a good clustering key, this is an incredibly difficult 
discussion to have in one article. To learn more, 
check out my posts about the clustering key at 
www.sglskillsscom/BLOGS/KIM BER LY/category/ 
Clustering-Key.aspx. 

5. Create the primary key as a constraint- 
based nonclustered index. Because nonclustered 
indexes use the clustering key, you should always 
create nonclustered indexes after creating the 
clustered index, as the following statement 
shows: 


ALTER TABLE schema.tablename 
ADD CONSTRAINT ConstraintName 
PRIMARY KEY NONCLUSTERED (key definition) 


6. Recreate the foreign key constraints. First, 
use the ENABLE STATEMENT generated in 
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LISTING 2: Code to Generate the DISABLE 
Command 


SELECT 
DISABLE STATEMENT - 
N'ALTER TABLE 
+ QUOTENAME(convert(sysname, schema name(o2.schema id)), N']') 
+ №." 
+ QUOTENAME(convert(sysname, o2.name), N']') 
+ N' NOCHECK CONSTRAINT ' 
+ QUOTENAME(convert(sysname, object name(f.object id)), N']') 
, ENABLE STATEMENT = 
N'ALTER TABLE ' 
+ QUOTENAME(convert(sysname, schema name(o2.schema id)), N']') 
+ №." 
+ QUOTENAME(convert(sysname, o2.name), N']') 
+ N' WITH CHECK CHECK CONSTRAINT ' 
+ QUOTENAME(convert(sysname, object name(f.object id)), N']"), 
RECHECK CONSTRAINT = 
N'SELECT OBJECTPROPERTY (OBJECT IDC" 
+ QUOTENAME (convert (sysname, object name(f.object id)), N'''') 
+ N'), ''CnstIsNotTrusted' ')" 
FROM 
Sys.objects AS o1, 
Sys.objects AS o2, 
sys.columns AS c1, 
sys.columns AS c2, 
sys.foreign_keys AS f 
INNER JOIN sys.foreign_key_columns AS k 
ON (k.constraint_object_id = f.object_id) 
INNER JOIN sys.indexes AS i 
ON (f.referenced_object_id = i.object id 
AND f.key index id = i.index id) 
WHERE 
ol.[object id] = object id('tablename') 
AND i.name = 'Primary key Name" 
AND ol.[object id] f.referenced object id 
AND o2.[object id] f.parent object id 
AND cl.[object id] f.referenced object id 
AND c2.[object id] f.parent object id 
AND cl.column id .referenced column id 
AND c2.column id .parent column id 
ORDER BY 1, 2, 3 


Wow 
Aw О ШОШ d 


step 2 to re-enable and recheck all of the foreign 
keys. In this case, you'll want to make sure to 
recheck the data as well using the WITH CHECK 
clause. However, this is likely to be a one-time 
thing, so as long as you kept the information from 
step 2, you should be able to recreate the foreign 
key constraints relatively easily. 

7. Once completed, make sure that all of the 
constraints are considered “trusted” by using the 
RECHECK, CONSTRAINT statements that 
were generated in step 2. 

8. Rebuild all of the nonclustered indexes 
(this is how you enable them again). Use the 
ENABLE STATEMENT created in step 1. 
Rebuilding a nonclustered index is the only way 
to enable them. 


Although this sounds like a complicated process, 
you can analyze it, review it, and script much of the 
code to minimize errors. The end result is that no 
matter what your clustering key is, it can be changed. 
Why you might want to change the clustering key is a 
whole other can of worms that I don't have space to 
gointoin this answer, but keep following the Kimberly 
& Paul: SQL Server Questions Answered blog (www 
.sqImag.com/blogs/sgl-server-questions-answered 
.aspx) and ГЇЇ open that can, and many more, in the 
future! 500] 
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of its enterprise relational database and busi- 

ness intelligence (BI) platform, and it builds on 
the base of functionality established by SQL Server 
2008. However, despite the R2 moniker, Microsoft 
has added an extensive set of new features to SQL 
Server 2008 R2. Although the new support for self- 
service BI and PowerPivot has gotten the lion’s share 
of attention, SQL Server 2008 R2 includes several 
other important enhancements. In this article, we'll 
look at the most important new features in SQL 
Server 2008 R2. 


cH Server 2008 R2 is Microsoft's latest release 


New Editions 

Some of the biggest changes with the R2 release of 
SQL Server 2008 are the new editions that Micro- 
soft has added to the SQL Server lineup. SQL Server 
2008 R2 Datacenter Edition has been added to the 
top of the relational database product lineup and 
brings the SQL Server product editions in-line with 
the Windows Server product editions, including its 
Datacenter Edition. SQL Server 2008 R2 Datacenter 
Edition provides support for systems with up to 256 
processor cores. In addition, it offers multiserver 
management and a new event-processing technol- 
ogy called StreamInsight. (ГЇЇ cover multiserver 
management and StreamInsight in more detail later 
in this article.) 

The other new edition of SQL Server 2008 R2 
is the Parallel Data Warehouse Edition. The Paral- 
lel Data Warehouse Edition, formerly code-named 
Madison, is a different animal than the other edi- 
tions of SQL Server 2008 R2. It's designed as a Plug 
and Play solution for large data warehouses. It's a 


combination hardware and software solution that's 
available only through select OEMs such as HP, 
Dell, and IBM. OEMs supply and preconfigure all 
the hardware, including the storage to support the 
data warehouse functionality. The Parallel Data 
Warehouse Edition uses a shared-nothing Massively 
Parallel Processing (MPP) architecture to support 
data warehouses from 10TB to hundreds of tera- 
bytes in size. As more scalability is required, addi- 
tional compute and storage nodes can be added. 

As you would expect, the Parallel Data Ware- 
house Edition is integrated with SQL Server Integra- 
tion Services (SSIS), SOL Server Analysis Services 
(SSAS), and SQL Server Reporting Services (SSRS). 
For more in-depth information about the SQL 
Server 2008 R2 Parallel Data Warehouse Edition, 
see “Getting Started with Parallel Data Warehouse,” 
page 39, InstantDoc ID 125098. 

The SQL Server 2008 R2 lineup includes 
e SQL Server 2008 R2 Parallel Data Warehouse 

Edition 
e SQL Server 2008 R2 Datacenter Edition 
• SQL Server 2008 R2 Enterprise Edition 
• SQL Server 2008 R2 Developer Edition 
e SQL Server 2008 R2 Standard Edition 
e SQL Server 2008 R2 Web Edition 
e SQL Server 2008 R2 Workgroup Edition 
e SQL Server 2008 R2 Express Edition (Free) 
e SQL Server 2008 Compact Edition (Free) 


More detailed information about the SQL Server 2008 
R2 editions, their pricing, and the features that they 
support can be found in Table 1. SQL Server 2008 R2 
supports upgrading from SQL Server 2000 and later. 
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or Up to 

essor Cores 

dware side, SQL Server 2008 R2 Data- 

ition now supports systems with up to 64 

rocessors and 256 cores. This support 

ter scalability in the x64 line than ever 

erver 2008 R2 Enterprise Edition sup- 

64 processors, and Standard Edition 

9 to four processors. 

rver 2008 R2 remains one of the few 

rver platforms that is still available in 

and 64-bit versions. I expect it will be 
version of SQL Server that Microsoft 


ot and Self-Service BI 

ot, the most publicized new feature 
2008 R2 is PowerPivot and self- 
QL Server 2008 R2’s PowerPivot for 
erly code-named Gemini) is essentially 
add-in that brings the SSAS engine into 
adds powerful data analysis capabili- 
Excel, the front-end data analysis tool that 
knowledge workers know and use on a daily basis. 
Built-in data compression enables PowerPivot for 
Excel to work with millions of rows and still deliver 
subsecond response time. As you would expect, 
PowerPivot for Excel can connect to SQL Server 
2008 databases, but it can also connect to previ- 
ous versions of SQL Server as well as other data 
sources, including Oracle and Teradata, and even 
SSRS reports. In addition to its data manipulation 
capabilities, PowerPivot for Excel also includes a 
new cube-oriented calculation language called Data 
Analysis Expressions (DAX), which extends Excel’s 
data analysis capabilities with the multidimensional 
capabilities of the MDX language. Figure 1 shows 
the new PowerPivot for Excel add-in being used to 
create a PowerPivot chart and PowerPivot table for 
data analysis. 

PowerPivot for SharePoint enables the sharing, 
collaboration, and management of PowerPivot work- 
sheets. From an IT perspective, the most important 
feature that PowerPivot for SharePoint offers is the 
ability to centrally store and manage business-criti- 
cal Excel worksheets. This functionality addresses a 
huge hole that plagues most businesses today. Critical 
business information is often kept in a multitude of 
Excel spreadsheets, and unlike business application 
databases, in the vast majority of cases these spread- 
sheets are unmanaged and often aren’t backed up or 
protected in any way. If they’re accidentally deleted or 
corrupted, there’s a resulting business impact that IT 
can't do anything about. Using SharePoint as a cen- 
tral storage and collaboration point facilitates sharing 
these important Excel spreadsheets, but perhaps more 
importantly, it provides a central storage location in 
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TABLE I: SQL Server 2008 R2 Edi 


rver 2008 R2 


Parallel Data Warehouse 


Datacenter 


Enterprise 


Developer 


Standard 


Web 


Workgroup 


Express Base 


Express with Tools 


Express with Advanced 
Services 


SQL SERVER 2008 R2 NEW FEATURES 


icing 


$57,498 per CPU 
Not offered via server CAL 


$57,498 per CPU 
Not offered via server CAL 


$28,749 per CPU 
$13,969 per server 
with 25 CALs 


$50 per developer 


$7,499 per CPU 
$1,849 per server 
with 5 CALs 


$15 per CPU per month 
Not offered via server CAL 


$3,899 per CPU 
$739 per server 
with 5 CALs 


Free 


Free 


Free 


tions 
| Significant Features 


MPP scale-out architecture 
BI—SSAS, SSIS, SSRS 


64 CPUs and up to 256 cores 
2TB of RAM 

16-node failover clustering 
Database mirroring 
Streamlnsight 

Multiserver management 
Master Data Services 
BI—SSAS, SSIS, SSRS 
PowerPivot for SharePoint 
Partitioning 

Resource Governor 

Online indexing and restore 
Backup compression 


64 CPUs and up to 256 cores 
2ТВ of RAM 

16-node failover clustering 
Database mirroring 
Multiserver management 
Master Data Services 
BI—SSAS, SSIS, SSRS 
PowerPivot for SharePoint 
Partitioning 

Resource Governor 

Online indexing and restore 
Backup compression 


Same as the Enterprise Edition 


4 CPUs 

2ТВ of RAM 

2-node failover clustering 
Database mirroring 
BI—SSAS, SSIS, SSRS 
Backup compression 


4 CPUs 
2TB of RAM 
BI—SSRS 


2 CPUs 
AGB of RAM 
BI—SSRS 


1 CPU 
1GB of RAM 


1 CPU 
1GB of RAM 


1 CPU 

1GB of RAM 

BI—SSRS (for the local 
instance) 
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Creating a PowerPivot chart and PowerPivot table for data analysis 
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A SQL Server Utility Control Point in the Utility Explorer 


which these critical Excel spreadsheets can be man- 


i SQL SERVER 2008 R2 NEW FEATURES 


Multiserver Management 

Some of the most important additions to SQL 
Server 2008 R2 on the relational database side 
are the new multiserver management capabili- 
ties. Prior to SQL Server 2008 R2, the multi- 
server management capabilities in SQL Server 
were limited. Sure, you could add multiple serv- 
ers to SQL Server Management Studio (SSMS), 
but there was no good way to perform similar 
tasks on multiple servers or to manage multiple 
servers as a group. SQL Server 2008 R2 includes 
a new Utility Explorer, which is part of SSMS, 
to meet this need. The Utility Explorer lets you 
create a SQL Server Utility Control Point where 
you can enlist multiple SQL Server instances to 
be managed, as shown in Figure 2. The Utility 
Explorer can manage as many as 25 SQL Server 
instances. 

The Utility Explorer displays consolidated 
performance, capacity, and asset information 
for all the registered servers. However, only SQL 
Server 2008 R2 instances can be managed with 
the initial release; support for earlier SQL Server 
versions is expected to be added with the first ser- 
vice pack. Note that multiserver management is 
available only in SQL Server 2008 R2 Enterprise 
Edition and Datacenter Edition. You can find 
out more about multiserver management at www 
.microsoft.com/sqlserver/2008/en/us/R2- 
multi-server.aspx. 


Master Data Services 

Master Data Services might be the most under- 
rated feature in SQL Server 2008 R2. It provides 
a platform that lets you create a master defini- 
tion for all the disparate data sources in your 
organization. Almost all large businesses have 
a variety of databases that are used by different 
applications and business units. These databases 
have different schema and data meanings for 
what's often the same data. This creates a prob- 
lem because there isn't one version of the truth 
throughout the enterprise, and businesses almost 
always want to bring disparate data together for 
centralized reporting, data analysis, and data 
mining. 

Master Data Services gives you the ability to cre- 
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aged and backed up by IT, providing the organization 
with a safety net for these documents that didn't exist 
before. PowerPivot for SharePoint is supported by 
SQL Server 2008 R2 Enterprise Edition and higher. 

As you might expect, the new PowerPivot func- 
tionality and self-service BI features require the lat- 
est versions of each product: SQL Server 2008 R2, 
Office 2010, and SharePoint 2010. You can find out 
more about PowerPivot and download it from www 
.powerpivot.com. 


ate a master data definition for the enterprise to map 
and convert data from all the different date sources 
into that central data repository. You can use Master 
Data Services to act as a corporate data hub, where 
it can serve as the authoritative source for enterprise 
data. Master Data Services can be managed using a 
web client, and it provides workflows that can notify 
assigned data owners of any data rule violations. 
Master Data Services is available in SQL Server 
2008 R2's Enterprise Edition and Datacenter Edition. 
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Streaminsight 

StreamInsight is a near real- 
time event monitoring and pro- 
cessing framework. It's designed 
to process thousands of events 
per second, selecting and writ- 
ing out pertinent data to a SQL 
Server database. This type of 
high-volume event processing 
is designed to process manufac- 
turing data, medical data, stock 
exchange data, or other process- 
control types of data streams 
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where your organization wants N arentreport server. 


to capture real-time data for 
data mining or reporting. 

StreamInsight is a program- 
ming framework and doesn't have 
a graphical interface. It's available only in SQL Server 
2008 R2 Datacenter Edition. You can read more 
about SQL Server 2008 R2’s StreamInsight technol- 
ogy at www.microsoft.com/sqlserver/2008/en/us/R2- 
complex-event.aspx. 


Figure 3 


Report Builder 3.0 

Not all businesses are diving into the analytical side 
of BI, but almost everyone has jumped onto the 
SSRS train. With SQL Server 2008 R2, Microsoft 
has released a new update to the Report Builder por- 
tion of SSRS. Report Builder 3.0 (shown in Figure 3) 
offers several improvements. Like Report Builder 2.0, 
it sports the Office Ribbon interface. You can integrate 
geospatial data into your reports using the new Map 
Wizard, and Report Builder 3.0 includes support for 
adding spikelines and data bars to your reports so that 
queries can be reused in multiple reports. In addition, 
you can create Shared Datasets and Report Parts that 
are reusable report items stored on the server. You can 
then incorporate these Shared Datasets and Report 
Parts in the other reports that you create. 


Other Important 

Enhancements 

Although SQL Server 2008 R2 had a short two- 

year development cycle, it includes too many new 

features to list in a single article. The following are 

some other notable enhancements included in SQL 

Server 2008 R2: 

e The installation of slipstream media containing 
current hotfixes and updates 

e The ability to create hot standby servers with 
database mirroring 
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The Report Designer 3.0 design surface 


e The ability to connect to and manage SQL Azure 
instances 

e The addition of SSRS support for SharePoint zones 

• The ability to create Report Parts that can be 
shared between multiple reports 

e The addition of backup compression to the 
Standard Edition 


You can learn more about the new features in SQL 
Server 2008 R2 at msdn.microsoft.com/en-us/library/ 
bb500435(SQL.105).aspx. 


To R2 or Not to R2? 

SQL Server 2008 R2 includes a tremendous amount 
of new functionality for an R2 release. Although the 
bulk of the new features, such as PowerPivot and 
the Parallel Data Warehouse, are BI oriented, there 
are also several significant new relational database 
enhancements, including multiserver management 
and Master Data Services. However, it remains to be 
seen how quickly businesses will adopt SQL Server 
2008 R2. All current Software Assurance (SA) cus- 
tomers are eligible for the new release at no addi- 
tional cost, but other customers will need to evaluate 
if the new features make the upgrade price worth- 
while. Perhaps more important than price are the 
resource demands needed to roll out new releases of 
core infrastructure servers such as SQL Server. 

That said, PowerPivot and self-service BI are poten- 
tially game changers, especially for organizations that 
have existing BI infrastructures. The value these fea- 
tures bring to organizations heavily invested in BI 
makes SQL Server 2008 R2 a must-have upgrade. В 
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Index ordering, parallelism, and ranking calculations 


C ertain aspects of SQL Server index B-trees 
and their use cases are common knowledge, 
but some aspects are less widely known 
because they fall into special cases. In this article 
I focus on special cases related to backward index 
ordering, and I provide guidelines and recommenda- 
tions regarding when to use descending indexes. All 
my examples use a table called Orders that resides 
in a database called Performance. Run the code in 
Listing 1 to create the sample database and table and 
populate it with sample data. Note that the code in 
Listing 1 is a subset of the source code I prepared for 
my book /nside Microsoft SOL Server 2008: T-SQL 
Querying (Microsoft Press, 2009), Chapter 4, Query 
Tuning. If you have the book and already created the 
Performance database in your system, you don't need 
to run the code in Listing 1. 

One of the widely understood aspects of SOL 
Server indexes is that the leaf level of an index enforces 
bidirectional ordering through a doubly-linked list. 
This means that in operations that can potentially rely 
on index ordering—for example, filtering (seek plus 
partial ordered scan), grouping (stream aggregate), 
presentation ordering (ORDER BY)—the index can 
be scanned either in an ordered forward or ordered 
backward fashion. So, for example, if you have a 
query with ORDER BY coll DESC, col2 DESC, 
SQL Server can rely on index ordering both when you 
create the index on a key list with ascending ordering 
(coll, col2) and with the exact reverse ordering (coll 
DESC, col2 DESC). 

So when do you need to use the DESC index key 
option? Ask SQL Server practitioners this question, 
and most of them will tell you that the use case is when 
there are at least two columns with opposite ordering 
requirements. For example, to support ORDER BY 
coll, col2 DESC, there's no escape from defining 
one of the keys in descending order—either (coll, 
col2 DESC), or the exact reverse order (coll DESC, 
col2). Although this is true, there's more to the use of 
descending indexes than what's commonly known. 


Index Ordering and Parallelism 
As it turns out, SOL Server's storage engine isn't 
coded to handle parallel backward index scans (as 
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of SQL Server 2008 SP1 Cumulative Update 6)—not 
because there's a technical problem or engineering 
difficulty with supporting the option, but simply 
because it hasn't yet floated as a customer request. 
My guess is that most DBAs just aren't aware of 
this behavior and therefore haven't thought to ask 
for it. Although performing a backward scan gives 
you the benefit of relying on index ordering and 
therefore avoiding expensive sorting or hashing, the 
query plan can't benefit from parallelism. If you 
find a case in which parallelism is important, you 
need to arrange an index that allows an ordered 
forward scan. 
Consider the following query as an example: 
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USE Performance; 


SELECT * 

FROM dbo.Orders 

WHERE orderid «- 100000 
ORDER BY orderdate; 


Î iORE on the WEB 


Download the listing at 
InstantDoc ID 125090. 


There’s a clustered index defined on the table with 
orderdate ascending as the key. The table has 
1,000,000 rows, and the number of qualifying rows 
in the query is 100,000. My laptop has eight logical 
CPUs. Figure 1 shows the graphical query plan for 
this query. Here’s the textual plan: 


|--Parallelism(Gather Streams, ORDER BY: 
([orderdate] ASC)) 

|--Clustered Index Scan(OBJECT:([idx cl od]), 
WHERE: ([orderid]«-(100000)) ORDERED 
FORWARD) 


As you can see, a parallel query plan was used. Now 
try the same query with descending ordering: 


SELECT * 

FROM dbo.Orders 

WHERE orderid «- 100000 
ORDER BY orderdate DESC; 


Figure 2 shows the graphical query plan for this 
query. Here's the textual plan: 


June 2010 21 


DESCENDING INDEXES 


LISTING I: Script to Create Sample Database and Tables 


SET NOCOUNT ON; 
USE master; 


IF DB ID('Performance') IS NULL 
CREATE DATABASE Performance; 


GO 
USE Performance; 
GO 


-- Creating and Populating the Nums Auxiliary Table 


SET NOCOUNT ON; 


IF OBJECT ID('dbo.Nums', 


DROP TABLE dbo.Nums; 


CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY); 


DECLARE @max AS INT, @rc AS INT; 


SET Gmax = 1000000; 
SET @гс = 1; 


INSERT INTO dbo.Nums(n) VALUES(1) ; 


WHILE @rc * 2 <= @max 
BEGIN 


INSERT INTO dbo.Nums(n) SELECT n + @rc FROM dbo.Nums; 


SET @гс = @гс * 2; 
END 


INSERT 
SELECT 
GO 


IF OBJECT ID('dbo.Orders', 
DROP TABLE dbo.Orders; 


GO 


-- Data Distribution Settings 


DECLARE 
@numorders AS INT, 
@numcusts AS INT, 
@numemps AS INT, 
@numshippers AS INT, 
@numyears AS INT, 
@startdate AS DATETIME; 


NTO dbo.Nums (n) 
n + @rc FROM dbo.Nums WHERE n + @rc <= @max; 


SELECT 
Gnumorders = 1000000, 
@numcusts = 20000, 
@numemps = 500, 
Gnumshippers = 3 
Gnumyears = 4, 
Gstartdate = '20050101'; 


CREATE TABLE dbo.Orders 


2; 


CREATE CLUSTERED INDEX 


-- Creating and Populating the Orders Table 


ayia C 
U) 35 UOT BULL orderid INT NOT NULL, 
custid CHAR(11) NOT NULL, 
empid INT NOT NULL, 
shipperid VARCHAR (5) NOT NULL, 
orderdate DATETIME NOT NULL, 
filler CHAR(155) NOT NULL DEFAULT('a') 


idx cl od ON dbo.Orders(orderdate) ; 


ALTER TABLE dbo.Orders ADD 


CONSTRAINT PK Orders 


PRIMARY KEY NONCLUSTERED (orderid) ; 


INSERT INTO dbo.Orders WITH (TABLOCK) (orderid, custid, empid, shipperid, 


orderdate) 
SELECT n AS orderid, 


'C' + КІСНТС'000000000' 


+ CASTC 


1 + ABSCCHECKSUM(NEWID())) % @numcusts 
AS VARCHAR(10)), 10) AS custid, 


'U') IS NOT NULL 


1 + ABS(CHECKSUM(NEWID())) % Gnumemps AS empid, 
CHAR(ASCII('A') - 2 


+ 2 * (1 + ABSCCHECKSUM(NEWID())) % Gnumshippers)) AS shipperid, 


DATEADD(day, n 
-- late arrival 


/ (@numorders / (Gnumyears * 365.25)), @startdate) 
with earlier date 


- CASE WHEN n % 10 = Ø 


THEN 1 + ABS(CHECKSUM(NEWID())) ?6 38 


ELSE 0 


END AS orderdate 


FROM dbo.Nums 


WHERE n <= @numorders 


ORDER BY CHECKSUM(NEWID()); 


GO 


“| s kb 
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Parallel query plan 
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|--Clustered Index Scan(OBJECT: ([idx cl od]), 
WHERE: ([orderid]«- (100000)) ORDERED BACKWARD) 


Note that although an ordered scan of the index was 
used, the plan is serial because the scan ordering is 
backward. If you want to allow parallelism, the index 
must be scanned in an ordered forward fashion. So 
in this case, the orderdate column must be defined 
with DESC ordering in the index key list. 

This reminds me that when descending indexes 
were introduced in SQL Server 2000 RTM, my friend 
Wayne Snyder discovered an interesting bug. Suppose 
you had a descending clustered index on the Orders 
table and issued the following query: 


DELETE FROM dbo.Orders WHERE orderdate 
< '20050101' ; 


Instead of deleting the rows before 2005, SQL Server 
deleted all rows after January 1, 2005! Fortunately, 


Serial query plan 


Wayne discovered this bug and reported it, and it was 
fixed in SQL Server 2000 SP1. 


Index Ordering and Ranking 
Calculations 

Back to cases in which descending indexes are 
relevant, it appears that ranking calculations—par- 
ticularly ones that have a PARTITION BY clause— 
need to perform an ordered forward scan of the index 
in order to avoid the need to sort the data. Again, this 
is the case only when the calculation is partitioned. 
When the calculation isn't partitioned, both a forward 
and backward scan can be utilized. Consider the fol- 
lowing example: 


SELECT 
ROW NUMBER() OVER(ORDER BY orderdate DESC, 
orderid DESC) AS RowNum, 
orderid, orderdate, custid, filler 
FROM dbo.Orders; 
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actually sort the data, as the query execution plan 
in Figure 3 shows. Here's the textual form of the Figure 4 


plan: 


|--Sequence Project (DEFINE: ([Expr1004] 
-row number)) 
| -- Segment 
|--Parallelism(Gather Streams, ORDER 
BY:([orderdate] DESC, [orderid] DESC)) 
|--Sort(ORDER BY:([orderdate] DESC, [orderid] 
DESC)) 
|--Clustered Index Scan(OBJECT: ([idx cl od])) 


Indexing guidelines for queries with nonpartitioned 


ranking calculations are to have the ranking ordering 
columns in the index key list, either in specified 


о 


MANIPULATE 
THE DATA}: 


о 


un 


Query plan without sort for nonpartitioned ranking calculation 


order or exactly reversed, plus include the rest of the 
columns from the query in the INCLUDE clause for 
coverage purposes. With this in mind, to support the 
previous query you can define the index with all the 
keys in ascending order, like so: 


CREATE UNIQUE INDEX idx od oid i cid filler 
ON dbo.Orders(orderdate, orderid) 
INCLUDE(custid, filler); 


Rerun the query, and observe in the query execution 
plan in Figure 4 that the index was scanned in an 
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Query plan with sort for partitioned ranking calculation 
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ordered backward fashion. Here's the textual form 
of the plan: 


|--Sequence Project (DEFINE: ([Ехрг1004]=гом number)) 
|--Segment 
|--Index Scan(OBJECT: ([idx_od_oid_i_cid_ 
filler]), ORDERED BACKWARD) 


However, when partitioning is involved in the 
ranking calculation, it appears that SQL Server is 
strict about the ordering requirement—it must match 
the ordering in the expression. For example, consider 
the following query: 


SELECT 
ROW_NUMBER() OVER(PARTITION BY custid 
ORDER BY orderdate DESC, 
orderid DESC) AS RowNum, 
orderid, orderdate, custid, filler 
FROM dbo.Orders; 


When partitioning is involved, the indexing guidelines 
are to put the partitioning columns first in the key 
list, and the rest is the same as the guidelines for 
nonpartitioned calculations. Now try to create an 
index following these guidelines, but have the ordering 
columns appear in ascending order in the key list: 


CREATE UNIQUE INDEX idx cid od oid i filler 
ON dbo.Orders(custid, orderdate, orderid) 
INCLUDE(fi ller) ; 


Observe in the query execution plan in Figure 5 
that the optimizer didn't rely on index ordering but 
instead sorted the data. Here's the textual form of 
the plan: 


|--Parallelism(Gather Streams) 
| --Index Insert(0BJECT: ([idx cid od oid i. 
filler])) 
|--Sort (ORDER BY:([custid] ASC, [orderdate] 
ASC, [orderid] ASC) PARTITION 
ID: ([custid])) 


If you want to avoid sorting, you need to 
arrange an index that matches the ordering in 
the ranking calculation exactly, like so: 


CREATE UNIQUE INDEX idx_cid_odD_oidD_i_ 
filler 
ON dbo.Orders(custid, orderdate DESC, 
orderid DESC) 
INCLUDE (fi1 ler) ; 


Examine the query execution plan in Figure 6, and 
observe that the index was scanned in an ordered 
forward fashion and a sort was avoided. Here’s the 
textual plan: 


|--Sequence Project (DEFINE: ([Expr1004]-row . 
number) ) 
|--Segment 
|--Index Scan(OBJECT:C[idx cid odD. 
oidD i filler]), ORDERED FORWARD) 


When you're done, run the following code for 
cleanup: 


DROP INDEX dbo.Orders.idx od oid i cid. 
filler; 
DROP INDEX dbo.Orders.idx cid od oid i. 
filler; 
DROP INDEX dbo.Orders.idx cid odD oidD i filler; 


One More Time 
In this article I covered the usefulness of descending 
indexes. I described the cases in which index ordering 
can be relied on in both forward and backward linked 
list order, as opposed to cases that support only forward 
direction. I explained that partitioned ranking calcula- 
tions can benefit from index ordering only when an 
ordered forward scan is used, and therefore to benefit 
from index ordering you need to create an index in 
which the key column ordering matches that of the 
ORDER BY elements in the ranking calculation. I also 
explained that even when backward scans in an index 
are supported, this prevents parallelism; so even in those 
cases there might be benefit in arranging an index that 
matches the ordering requirements exactly rather than 
in reverse. SQL] 
InstantDoc ID 125090 
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Transactional 


Replication 


3 common transactional replication problems solved 


ransactional replication is a useful way to 
i keep schema and data for specific objects 
synchronized across multiple SQL Server 
databases. Replication can be used in simple scenarios 
involving a few servers or can be scaled up to complex, 
multi-datacenter distributed environments. However, 
no matter the size or complexity of your topology, 
the number of moving parts involved with replica- 
tion means that occasionally problems will occur that 
require a DBAS intervention to correct. 

In this article, ГЇЇ show you how to use SQL Server's 
native tools to monitor replication performance, 
receive notification when problems occur, and diagnose 
the cause of those problems. Additionally, ГЇЇ look at 
three common transactional replication problems and 
explain how to fix them. 


A View into Replication Health 
Replication Monitor is the primary GUI tool at your 
disposal for viewing replication performance and diag- 
nosing problems. Replication Monitor was included in 
Enterprise Manager in SQL Server 2000, but in SQL 
Server 2005, Replication Monitor was separated from 
SQL Server Management Studio (SSMS) into a stand- 
alone executable. Just like SSMS, Replication Monitor 
can be used to monitor Publishers, Subscribers, and 
Distributors running previous versions of SQL Server, 
although features not present in SQL Server 2005 won't 
be displayed or otherwise available for use. 

To launch Replication Monitor, open SSMS, con- 
nect to a Publisher in the Object Explorer, right-click 
the Replication folder, and choose Launch Replica- 
tion Monitor from the context menu. Figure 1 shows 
Replication Monitor with several registered Publishers 
added. Replication Monitor displays a treeview in the 
left pane that lists Publishers that have been registered; 
the right pane's contents change depending on what's 
selected in the treeview. 

Selecting a Publisher in the treeview shows three 
tabbed views in the right pane: Publications, which 
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shows the name, current status, and number of 
Subscribers for each publication on the Publisher; 
Subscription Watch List, which shows the status and 
estimated latency (1.е., time to deliver pending com- 
mands) of all Subscriptions to the Publisher; and 
Agents, which shows the last start time and current 
status of the Snapshot, Log Reader, and Queue Reader 
agents, as well as various automated maintenance jobs 
created by SQL Server to keep replication healthy. 

Expanding a Publisher node in the treeview shows 
its publications. Selecting a publication displays four 
tabbed views in the right pane: All Subscriptions, 
which shows the current status and estimated latency 
of the Distribution Agent for each Subscription; Tracer 
Tokens, which shows the status of recent tracer tokens 
for the publication (ГЇЇ discuss tracer tokens in more 
detail later); Agents, which shows the last 
start time, run duration, and current status of 
the Snapshot and Log Reader agents used by 
the publication; and Warnings, which shows 
the settings for all warnings that have been 
configured for the publication. 

Right-clicking any row (1.е., agent) in the Subscrip- 
tion Watch List, All Subscriptions, or Agents tabs 
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will display a context menu with options that include 
stopping and starting the agent, viewing the agent’s 
profile, and viewing the agent's job properties. Double- 
clicking an agent will open a new window that shows 
specific details about the agent’s status. 

Distribution Agent windows have three tabs: 
Publisher to Distributor History, which shows the 
status and recent history of the Log Reader agent for 
the publication; Distributor to Subscriber History, 
which shows the status and recent history of the Dis- 
tribution Agent; and Undistributed Commands, which 
shows the number of commands at the distribution 
database waiting to be applied to the Subscriber and 
an estimate of how long it will take to apply them. Log 
Reader and Snapshot Reader agent windows show 
only an Agent History tab, which displays the status 
and recent history of that agent. 

When a problem occurs with replication, such 
as when a Distribution Agent fails, the icons for 
the Publisher, Publication, and agent will change 
depending on the type of problem. Icons overlaid by 
a red circle with an X indicate an agent has failed, a 
white circle with a circular arrow indicates an agent 
is retrying a command, and a yellow caution symbol 
indicates a warning. Identifying the problematic 
agent is simply a matter of expanding in the treeview 
the Publishers and Publications that are alerting to 
a condition, selecting the tabs in the right pane for 
the agent(s) with a problem, and double-clicking the 
agent to view its status and information about the 
error. 


Measuring the Flow of Data 

Understanding how long it takes for data to move 
through each step is especially useful when trouble- 
shooting latency issues and will let you focus your atten- 
tion on the specific segment that’s problematic. Tracer 
tokens were added in SQL Server 2005 to measure the 
flow of data and actual latency from a Publisher all the 
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Tracer Tokens tab showing latency details for a token 


28 June 2010 


way through to Subscribers (the latency values shown 
for agents in Replication Monitor are estimated). 
Creating a tracer token writes a special marker to the 
transaction log of the Publication database that’s read 
by the Log Reader agent, written to the distribution 
database, and sent through to all Subscribers. The time 
it takes for the token to move through each step is saved 
in the Distribution database. 

Tracer tokens can be used only if both the Pub- 
lisher and Distributor are on SQL Server 2005 or 
later. Subscriber statistics will be collected for push 
subscriptions if the Subscriber is running SQL Server 
7.0 or later and for pull subscriptions if the Subscriber 
is running SQL Server 2005 or higher. For Subscribers 
that don't meet these criteria (non-SQL Server Sub- 
scribers, for example), statistics for tracer tokens will 
still be gathered from the Publisher and Distributor. 
To add a tracer token you must be a member of the 
sysadmin fixed server role or db. owner fixed database 
role on the Publisher. 

To add a new tracer token or view the status of 
existing tracer tokens, navigate to the Tracer Tokens 
tab in Replication Monitor. Figure 2 shows an 
example of the Tracer Tokens tab showing latency 
details for a previously inserted token. To add a new 
token, click Insert Tracer. Details for existing tokens 
can be viewed by selecting from the drop-down list on 
the right. 


Know When There Are Problems 
Although Replication Monitor is useful for viewing 
replication health, it's not likely (or even reasonable) 
that you'll keep it open all the time waiting for an error 
to occur. After all, as a busy DBA you have more to 
do than watch a screen all day, and at some point you 
have to leave your desk. 

However, SQL Server can be configured to raise 
alerts when specific replication problems occur. When 
a Distributor is initially set up, a default group of 
alerts for replication-related events is created. To view 
the list of alerts, open SSMS and make a connection 
to the Distributor in Object Explorer, then expand the 
SQL Server Agent and Alerts nodes in the treeview. 
To view or configure an alert, open the Alert proper- 
ties window by double-clicking the alert or right-click 
the alert and choose the Properties option from the 
context menu. Alternatively, alerts can be configured 
in Replication Monitor by selecting a Publication in 
the left pane, viewing the Warnings tab in the right 
pane, and clicking the Configure Alerts button. 
The options the alert properties window offers for 
response actions, notification, etc., are the same as an 
alert for a SQL Server agent job. Figure 3 shows the 
Warnings tab in Replication Monitor. 

There are three alerts that are of specific interest 
for transactional replication: Replication: Agent failure, 
Replication: Agent retry, and Replication Warning: 
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Transactional replication latency (Threshold: latency). 
By default, only the latency threshold alerts are 
enabled (but aren’t configured to notify an operator). 
The thresholds for latency alerts are configured in 
the Warnings tab for a Publication in Replication 
Monitor. These thresholds will trigger an alert if 
exceeded and are also used by Replication Monitor to 
determine if an alert icon is displayed on the screen. 
In most cases, the default values for latency alerts are 
sufficient, but you should review them to make sure 
they meet the SLAs and SLEs you're responsible for. 

A typical replication alert response is to send a 
notification (e.g., an email message) to a member of 
the DBA team. Because email alerts rely on Database 
Mail, you'll need to configure that first if you haven't 
done so already. Also, to avoid getting inundated 
with alerts, you'll want to change the delay between 
responses to five minutes or more. Finally, be sure 
to enable the alert on the General page of the Alert 
properties window. 

Changes to alerts are applied to the Distributor and 
affect all Publishers that use the Distributor. Changes 
to alert thresholds are applied only to the selected 
Publication and can't be applied on a Subscriber-by- 
Subscriber basis. 


All Subscriptions | Tracer Tokens | Agents +' 


Warnings 
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Figure 3 


Replication Monitor's Warnings tab 


LISTING |: Code to Acquire the Publisher's Database ID 


SELECT DISTINCT 
subscriptions.publisher database id 
Sys.servers AS [publishers] 
INNER JOIN distribution.dbo.MSpublications AS [publications] ON 
publishers.server id - publications.publisher id 
INNER JOIN distribution.dbo.MSarticles AS [articles] ON publications 
.publication id = articles.publication id 
INNER JOIN distribution.dbo.MSsubscriptions AS [subscriptions] ON 
articles.article id = subscriptions.article id 
AND articles.publication id = subscriptions.publication id 
AND articles.publisher db = subscriptions.publisher db 
AND articles.publisher id = subscriptions.publisher id 
INNER JOIN sys.servers AS [subscribers] ON subscriptions.subscriber id 
= subscribers.server id 
WHERE  publishers.name = 'MyPublisher' 
AND publications.publication = 'MyPublication' 


FROM 


Other Potential Problems to 
Keep an Eye On 


Two other problems can creep up that neither alerts 


nor Replication Monitor will bring to your atten- 
tion: agents that are stopped, and unchecked growth 
of the distribution database on the Distributor. 

A common configuration option is to run agents 
continuously (or Start automatically when SQL Server 
Agent starts). Occasionally, they might need to be 
stopped, but if they aren't restarted, you can end up 
with transactions that accumulate at the Distributor 
waiting to be applied to the Subscriber or, if the log 
reader agent was stopped, transaction log growth at 
the Publisher. The estimated latency values displayed 
in Replication Monitor are based on current perfor- 
mance if the agent is running, or the agent's most 
recent history if it’s stopped. If the agent was below 
the latency alert threshold at the time it was stopped, 
then a latency alert won't be triggered and Replication 
Monitor wont show an alert icon. 

The dbo.Admin Start. Idle Кері Agents stored 
procedure in Web Listing 1 (www.sglmag.com, 
InstantDoc ID 104703) can be applied to the Dis- 
tributor (and Subscribers with pull subscriptions) 
and used to restart replication agents that are 
scheduled to run continuously but aren't currently 
running. Scheduling this procedure to run periodi- 
cally (e.g., every six hours) will prevent idle agents 
from turning into bigger problems. 

Unchecked growth of the distribution database 
on the Distributor can still occur when all agents are 
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AND subscribers.name - 'MySubscriber' 


running. Once commands have been delivered to all 
Subscribers, they need to be removed to free space 
for new commands. When the Distributor is initially 
set up, a SQL Server Agent job named Distribution 
clean up: distribution is created to remove com- 
mands that have been delivered to all Subscribers. 
If the job is disabled or isn't running properly (e.g., 
is blocked) commands won't be removed and the 
distribution database will grow. Reviewing this job's 
history and the size of the distribution database for 
every Distributor should be part of а DBAS daily 
checklist. 


Common Problems and Solutions 
Now that you have the tools in place to monitor 
performance and know when problems occur, let's 
take a look at three common transactional replication 
problems and how to fix them. 

Distribution Agents fail with the error message The 
row was not found at the Subscriber when applying the 
replicated command or Violation of PRIMARY KEY 
constraint [ Primary Key Name]. Cannot insert dupli- 
cate key in object [ Object Name]. 

Cause: By default, replication delivers com- 
mands to Subscribers one row at a time (but as 


June 2010 29 


TROUBLESHOOTING TRANSACTIONAL REPLICATION 


part of a batch wrapped by a transaction) and 
uses @@rowcount to verify that only one row was 
affected. The primary key is used to check for which 
row needs to be inserted, updated, or deleted; for 
inserts, if a row with the primary key already exists 
at the Subscriber, the command will fail because 
of a primary key constraint violation. For updates 
or deletes, if no matching primary key exists, 
@@rowcount returns 0 and an error will be raised 
that causes the Distribution Agent to fail. 

Solution: If you don’t care which command is 
failing, you can simply change the Distribution 
Agent’s profile to ignore the errors. To change the 
profile, navigate to the Publication in Replication 
Monitor, right-click the problematic Subscriber in the 


3} View connection properties 


f 


All Subscriptions tab, and choose the Agent Profile 
menu option. A new window will open that lets you 
change the selected agent profile; select the check box 
for the Continue on data consistency errors profile, 
and then click OK. Figure 4 shows an example of 
the Agent Profile window with this profile selected. 
The Distribution Agent needs to be restarted for 
the new profile to take effect; to do so, right-click 
the Subscriber and choose the Stop Synchronizing 
menu option. When the Subscriber’s status changes 
from Running to Not Running, right-click the Sub- 
scriber again and select the Start Synchronizing menu 
option. 

This profile is a system-created profile that will 
skip three specific errors: inserting a row with a 
duplicate key, constraint violations, and rows missing 
from the Subscriber. If any of these errors occur while 


GB spent rome = EI remi using this profile, the Distribution Agent will move 
ЕТ тетри [So ~ нр оп to the next command rather than failing. When 
m choosing this profile, be aware that the data on the 
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be executed at the Distributor. Three parameters are 
required: an ID for the Publisher database, a transac- 
tion sequence number, and a command ID. To get the 
Publisher database ID, execute the code in Listing 1 
on your Distributor (filling in the appropriate values 


Connection for Publisher, Subscriber, and Publication). To get the 
Server: transaction sequence number and command ID, navi- 
Connection: gate to the failing agent in Replication Monitor, open 


its status window, select the Distributor to Subscriber 
History tab, and select the most recent session with 


— an Error status. The transaction sequence number 
and command ID are contained in the error details 
message. Figure 5 shows an example of an error 

_ саке | message containing these two values. 
* Finally, execute the code in Listing 2 using the 
Figure 4 


Continue on data consistency errors profile selected in the Distribution Agent's profile 


[Error details or message of the selected session: 


: 00000001900001926000800000000, Command ID: 1) 


messages: 
The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598) 
Get help: http //help/20598 
The row was not found at the Subscriber when applying the replicated command. (Source: MSSGLServer, Error number: 20598) 


et bale: оне. / fele /ЭГЕОО 


Figure 5 


An error message containing the transaction sequence number and command ID 


LISTING 2: Code to Show the Command 
that's Failing at the Subscriber 


values you just retrieved to show the command that's 
failing at the Subscriber. Once you know the com- 
mand that's failing, you can make changes at the 
Subscriber for the command to apply successfully. 

Distribution Agent fails with the error message 
Could not find stored procedure 'sp MSins . 

Cause: The Publication is configured to deliver 
INSERT, UPDATE, and DELETE commands using 
stored procedures, and the procedures have been 
dropped from the Subscriber. Replication stored 
procedures aren't considered to be system stored 
procedures and can be included using schema com- 
parison tools. If the tools are used to move changes 
from a non-replicated version of a Subscriber database 


to a replicated version (e.g., migrating schema changes 
from a local development environment to a test envi- 
ronment) the procedures could be dropped because 
they don't exist in the non-replicated version. 


EXECUTE distribution.dbo.sp browsereplcmds 
Gxact seqno start = '0x0000001900001926000800000000' , 
Gxact seqno end = '0x0000001900001926000800000000', 
Gpublisher database id - 29, 
@command_id = 1 
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Solution: This is an easy problem to fix. In the 
published database on the Publisher, execute the 
sp_scriptPublicationcustomprocs stored procedure 
to generate the INSERT, UPDATE, and DELETE 
stored procedures for the Publication. This procedure 
only takes one parameter—the name of the Publica- 
tion—and returns a single nvarchar(4000) column as 
the result set. When executed in SSMS, make sure 
to output results to text (navigate to Control-T or 
Query Menu, Results To, Results To Text) and that 
the maximum number of characters for results to 
text is set to at least 8,000. You can set this value by 
selecting Tools, Options, Query Results, Results to 
Text, Maximum number of characters displayed in 
each column. After executing the stored procedure, 
copy the scripts that were generated into a new query 
window and execute them in the subscribed database 
on the Subscriber. 

Distribution Agents won’t start or don’t appear to 
do anything. 

Cause: This typically happens when a large 
number of Distribution Agents are running on the 
same server at the same time; for example, on a 
Distributor that handles more than 50 Publications 
or Subscriptions. Distribution Agents are indepen- 
dent executables that run outside of the SQL Server 
process in a non-interactive fashion (1.е., no GUI). 
Windows Server uses a special area of memory 
called the non-interactive desktop heap to run these 
kinds of processes. If Windows runs out of available 
memory in this heap, Distribution Agents won’t be 
able to start. 

Solution: Fixing the problem involves making 
a registry change to increase the size of the non- 
interactive desktop heap on the server experi- 
encing the problem (usually the Distributor) and 
rebooting. However, it’s important to note that 
modifying the registry can result in serious prob- 
lems if it isn’t done correctly. Be sure to perform the 


(ie, making it a value of 768 or 1,024) should 
be sufficient to resolve the issue. Click OK after 
modifying the value. Rebooting will ensure that the 
new value is used by Windows. For more informa- 
tion about the non-interactive desktop heap, see 
"Unexpected behavior occurs when you run many 
processes on a computer that is running SQL Server" 
(support.microsoft.com/kb/824422). 


Monitoring Your Replication 
Environment 
When used together, Replication Monitor, tracer 
tokens, and alerts are a solid way for you to monitor 
your replication topology and understand the 
source of problems when they occur. Although the 
techniques outlined here offer guidance about how 
to resolve some of the more common issues that 
occur with transactional replication, there simply 
isn't enough room to cover all the known problems 
in one article. For more tips about troubleshooting 
replication problems, visit the Microsoft SQL Server 
Replication Support Team's REPLTalk blog at 
blogs.msdn.com/repltalk. ЕП 
InstantDoc ID 104703 
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following steps carefully and back up the registry 
before you modify it: 

1. Start the Registry Editor by typing regedit32 
.exe in a run dialog box or command prompt. 

2. Navigate to the HKEY LOCAL - 
MACHINE\SYSTEM\CurrentControlSet\Control\ 
Session Manager\SubSystems key in the left pane. 

3. In the right pane, double-click the Windows 
value to open the Edit String dialog box. 

4. Locate the SharedSection parameter in 
the Value data input box. It has three values 
separated by commas, and should look like the 
following: 
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SharedSection=1024 , 3072 ,512 


The desktop heap is the third value (512 in this 
example). Increasing the value by 256 or 512 
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Use this technique to speed up your reports 


nly static reports saved as pre-rendered 

images—snapshot reports—can be loaded 

and displayed (almost) instantly, so users 
are accustomed to some delay when they ask for 
reports that reflect current data. Some reports, how- 
ever, can take much longer to generate than others. 
Complex or in-depth reports can take many hours to 
produce, even on powerful systems, while others can 
be built and rendered in a few seconds. Parameter- 
driven expressions, a technique that I expect is new 
to many of you, can help you greatly in speeding up 
your reports. 

Visit the online version of this article at www 
.sqimag.com, InstantDoc ID 125092, to down- 
load the example I created against the Adventure- 
Works2008 database. If you’re interested in a more 
general look at improving your report performance, 
see the web-exclusive sidebar “Optimizing SSRS 
Operations,” which offers more strategies for creating 
reports that perform well. In this sidebar, I walk you 
through the steps your system goes through when it 
creates a report. I also share strategies, such as using 
a stored procedure to return the rowsets used in a 
report so that the SQL Server query optimizer can 
reuse a cached query plan, eliminating the need to 
recompile the procedure. 

The concepts ГЇЇ discuss here aren’t dependent 
on any particular version of SQL Server Reporting 
Services (SSRS) but Pll be using the 2008 ver- 
sion for the examples. Once you've installed the 
AdventureWorks2008 database, you'll start Visual 
Studio (VS) 2008 and load the ClientSide Filtering 
.sIn project. (This technique will work with VS 
2005 business intelligence—BI—projects, but I built 
the example report using VS 2008 and you cant 
load it in VS 2005 because the Report Definition 
Language—RDL—format is different.) Open Shared 
Data Source and the Project Properties to make sure 
the connection string points to your SSRS instance. 

The example report captures parameters from the 
user to focus the view on a specific class of bicycles, 
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such as mountain bikes. Once the user chooses a 
specific bike from within the class, a subreport is gen- 
erated to show details, including a photograph and 
other computed information. By separating the photo 
and computed information from the base query, you 
can help the report processor generate the base report 
much more quickly. 

In this example, my primary goal is to help the 
user focus on a specific subset of the data—in other 
words, to help users view only information in which 
they're interested. You can do this several ways, but 
typically you either add a parameter-driven WHERE 
clause to the initial query or parameter-driven filters 
to the report data regions. ГЇЇ do the latter in this 
example. 

Because the initial SELECT query exe- 
cuted by the report processor in this example 
doesn't include a WHERE clause, it makes 
sense to capture several parameters that the 


Report Data 


New ~ Edit 


xX eè 
0-08 Built-in Fields 
=) ig Parameters 
77 ProductLineWanted 
& PriceLow 
@ HighestPrice 
Ё@ ColorWanted 
=} {ay Images 
==] BillsBicycleWorksLogol 
= & AdventureWorks2008ClientFiltering 
3-0 ValidColors 
E] Color 
=.) DataSetl 
==) ProductName 
=) ProductNumber 
==] Color 
<=} ProductLine 
==] Class 
=] Style 
=] ListPrice 


Figure | 


The report’s Design pane 
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report processor can use to narrow the report’s focus. 
(There’s nothing to stop you from further refining 
the initial SELECT to include parameter-driven 
WHERE clause filtering.) ГЇЇ set up some report 
parameters—as opposed to query parameters—to 
accomplish this goal. 

1. In VS's Business Intelligence Development 
Studio (BIDS) Report Designer, navigate to the 
report's Design pane, which Figure 1 shows. Note 
that report-centric dialog boxes such as the Report 


By separating the photo and 


computed information from the 
base query, you can help the report 
processor generate the base report 


General 


Available Values 


Default Values 


Advanced 


much more quickly. 


Data window only appear when focus is set to the 
Design pane. 

2. Use the View menu to open the Report Data 
dialog box, which is new in the BIDS 2008 Report 
Designer. This box names each of the columns 
returned by the dataset that's referenced by (the 
case-sensitive) name. If you add columns to the 
dataset for some reason, make sure these changes 
are reflected in the Report Data dialog box as well 
as on your report. 

Don't expect to be able to alter the RDL (such 
as renaming the dataset) based on changes in the 
Report Data dialog box. When you rename the 
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query or change the columns being fetched, the 
designer doesn’t keep in sync with the RDL very 
well. Be prepared to open the RDL to make changes 
from time to time. 

3. Right-click the Parameters folder in the 
Report Data dialog and choose Add Parameter 
to open the Report Parameter Properties dialog 
box, which Figure 2 shows. Here’s where each of 
the parameters used in the filter expressions or the 
query parameters are defined. The query param- 
eters should be generated automatically if your 
query references a parameter in the WHERE clause 
once the dataset is bound to a data region on the 
report (such as a Tablix report element). 

4. Use the Report Parameter Properties dialog 
box to name and define the prompt and other 
attributes for each of the report parameters you'll 
need to filter the report. Figure 2 shows how I set 
the values, which are used to provide the user with 
a drop-down list of available Product Lines from 
which to choose. 

Note that the Value setting is un-typed—you 
can't specify a length or data type for these values. 
This can become a concern when you try to com- 
pare the supplied value with the data column values 
in the Filters expression I’m about to build, espe- 
cially when the supplied parameter length doesn't 
match the length of the data value being tested. 

5. T visited the Default Value tab of the Report 
Parameter Properties dialog box to set the param- 
eter default to M (for Mountain Bikes). If all of 
your parameters have default values set, the report 
processor doesn't wait before rendering the report 
when first invoked. This can be a problem if you 
can't determine a viable default parameter configu- 
ration that makes sense. 

6. The next step is to get the report processor 
to filter the report data based on the parameter 
value. On the Report Designer Design pane, 
click anywhere on the Tablel Tablix control and 
right-click the upper left corner of the column- 
chooser frame that appears. The trick is to make 
sure you've selected the correct element before 
you start searching for the properties, and you 
should be sure to choose the correct region when 
selecting a report element property page. It's easy 
to get them confused. 

7. Navigate to the Tablix Properties menu item, 
as Figure 3 shows. Here, you should add one or 
more parameter-driven filters to focus the report's 
data. Start by clicking Add to create a new Filter. 

8. Ignore the Dataset column drop-down list 
because it will only lead to frustration. Just click the 

fx expression button. This opens an expression 
editor where we're going to build a Boolean- 
returning expression that tests for the filter values. 
It's far easier to write your own expressions that 
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resolve to True or False than to get the report pro- 
cessor to keep the data types of the left and right 
side of the expressions straight. This is a big short- 
coming, but there’s a fairly easy workaround that 
Tm using for this example. 

9. You can use the Expression editor dialog 
box, which Figure 4 shows, to build simple 
expressions or reference other, more com- 
plex expressions that you’ve written as Visual 
Basic and embedded in the report. (The SSRS 
and ReportViewer report processors only 
know how to interpret simple Visual Basic 
expressions—not C# or any other language.) 
Virtually every property exposed by the report 
or the report elements can be configured to 
accept a runtime property instead of fixed value. 
Enter the expression shown in Figure 4. This 
compares the user-supplied parameter value 
(Parameters!ProductLineWanted. Value) with 
the Dataset column (Fields!ProductLine. Value). 
Click OK to accept this entry. Be sure to strip 
any space from these arguments using Trim so 
that strings that are different lengths will com- 
pare properly. 

10. Back in the Tablix Properties page, the 
expression text now appears as <<Expr>>, as 
Figure 5 shows. (This abbreviation, which hides 
the expression, isn’t particularly useful. I’ve asked 
Microsoft’s BI tools team to fix it.) Now ГЇЇ set the 
Value expression. Enter =True in the expression’s 
Value field. Don’t forget to prefix the value with the 
equals sign (=). This approach might be a bit more 
trouble, but it’s far easier to simply use Boolean 
expressions than to deal with the facts that the RDL 
parameters aren’t typed (despite the type drop-down 
list) and that the report processor logic doesn’t auto- 
type like you might expect because it’s interpreting 
Visual Basic. 

11. You're now ready to test the report. Simply 
click the Preview tab. This doesn't deploy the report 
to SSRS—it just uses the Report Designer's built-in 
report processor to show you an approximation of 
what the report should look like when it's deployed. 

12. If the report works, you're ready to deploy it 
to the SSRS server. Right-click the project to access 
the Properties page, which Figure 6 shows. Fill in 
the Target Report Folder and Target Server URL. 
Of course, these values will be up to your report 
admin to determine. In my case, I’m deploying to 
a folder called HHG Examples and targeting my 
Betavl/ReportServer's SS2K8 instance. You should 
note that the instance names are separated from the 
ReportServer virtual directory by an underscore in 
the 2008 version and a dollar sign in the 2000 and 
2005 versions. 

13. Right-click the report you want to deploy and 
choose Deploy. This should connect to the targeted 
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SSRS instance and save the report to the SSRS catalog. 
Deploying the report could take 30 seconds or longer 
the first time, as I’ve already discussed, so be patient. 


Creating a Snapshot Report 

Now that the report is deployed, you need to navigate 
to it with Report Manager so that you can generate a 
snapshot. Use Report Manager to open the deployed 
report’s properties pages. 
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1. Using Internet Explorer (I haven't had much 
luck with Firefox), browse to your SSRS Folder 
.aspx page, as Figure 7 shows. Your report should 
appear in the report folder where you deployed it. 

2. Find your report and click it. The report 
should render (correctly) in the browser window, 
and this is how your end users should see the 
report. Report users shouldn't be permitted to see 
or alter the report parameters, however— be sure to 
configure the report user rights before going into 
production. The following operations assume that 
you have the appropriate rights. 

3. Using the Parameters tab of the report prop- 
erty dialog box, you can modify the default values 
assigned to the report, hide them, and change the 
prompt strings. More importantly, you'll need to set 
up the report to use stored login credentials before 
you can create a snapshot. 

4. Navigate to the Data Sources tab of the report 
properties page. I configured the report to use a 
custom data source that has credentials that are 
securely stored in the report server. This permits 
the report to be run unattended if necessary (such 
as when you set up a schedule to regenerate the 
snapshot). In this case, I’ve created a special SQL 
Server login that has been granted very limited 
rights to execute the specific stored procedure being 
used by the report and query against the appropriate 
tables, but nothing else. 

5. Next, navigate to the Execution tab of the 
report property dialog box. Choose Render this 
report from a report execution snapshot. Here's 
where you can define a schedule to rebuild the 
snapshot. This makes sense for reports that take 
a long time to execute, because when the report is 
requested, the last saved version is returned, not a 
new execution. 

6. Check the Create a report snapshot when you 
click the Apply button on this page box and click 
Apply. This starts the report processor on the desig- 
nated SSRS instance and creates a snapshot of the 
report. The next time the report is invoked from a 
browser, the data retrieved when the snapshot was 
built is reused—no additional queries are required 
to render the report. It also means that as the user 
changes the filter parameters, it's still not neces- 
sary to re-run the query. This can save considerable 
time—especially in cases where the queries require a 
long time to run. 


Handling More Complex 
Expressions 

I mentioned that the report processor can handle 
both Visual Basic expressions embedded in the 
report and CLR-based DLL assemblies. It's easy 
to add Visual Basic functions to the report, but 
calling an external DLL is far more complex and 
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the subject of a future article. The problem is 
that code embedded in a report, be it a simple 
expression or a sophisticated set of Visual Basic 
functions, must be incorporated in every single 
report that invokes it. Because of this require- 
ment, it makes sense to be strategic with these 
routines—don't go too far toward building a set 
of reports that use this logic. That way you can 
build a template report that includes the report- 
level logic so that it won't have to be added on 
a piecemeal basis. 

The Report Designer custom code interface in 
the Report Properties dialog box doesn’t provide 
anything except a blank text box in which to save 
the Visual Basic functions you provide, so you 
probably want to add a separate Visual Basic class 
to the report project or build and test the func- 
tions independently. Once you're happy with the 
code, simply copy the source code to the Report 
Code window, as Figure 8 shows. This custom 
function is used to change the coded Product Line 
designation to a human-readable value. 

The code used to invoke a Visual Basic func- 
tion in an expression is straightforward—it’s too 
bad that the Report Designer doesn't recognize 
these functions to permit you to correctly point 
to them. This function is invoked on each row 
of the Tablix in the Product Line cell. The 
expression invokes the named function. Note 
that IntelliSense doesn't recognize the function 
(or any report-defined function). 

The report supplied with this article has quite 
a few more expressions built in. As Figure 9 
shows, the report captures several parameters, 
all of which are used in expressions to focus the 
Query report on a specific subset of the data. 
Because the report is recorded as a snapshot, the 
initial query isn't repeated when the user chooses 
another Product Line, price range, or color. I 
suggest you investigate the specific cells in the 
Tablix data region control to see how these are 
coded. Hopefully you won't find any surprises 
when you look through them, because they all 
use the same approach to building a Boolean 
left-hand side of the Filter expression. 

By this point, I hope you've learned some- 
thing about the inner workings of the SSRS 
engine and how the report processor handles 
requests for reports. Building an efficient query 
is just the beginning to building an efficient 
report, and you can use filters to minimize the 
amount of additional (often expensive) queries 
used to fetch more focused data. Remember that 
a snapshot report can be generated and stored 
indefinitely, so report consumers can use the 
results of your work over and over again. В 
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A peek at SQL Server 2008 R2's new edition 


his summer Microsoft will release the SQL 
i Server 2008 R2 Parallel Data Warehouse 
(PDW) Edition, its first product in the 
Massively Parallel Processor (MPP) data ware- 
house space. PDW uniquely combines MPP software 
acquired from DATAllegro, parallel compute nodes, 
commodity servers, and disk storage. PDW lets you 
scale out enterprise data warehouse solutions into the 
hundreds of terabytes and even petabytes of data for 
the most demanding customer scenarios. In addition, 
because the parallel compute nodes work concur- 
rently, it often takes only seconds to get the results of 
queries run against tables containing trillions of rows. 
For many customers, the large data sets and the fast 
query response times against those data sets are game- 
changing opportunities for competitive advantage. 

The simplest way to think of PDW is a layer of 
integrated software that logically forms an umbrella 
over the parallel compute nodes. Each compute node 
is a single physical server that runs its own instance 
of the SQL Server 2008 relational engine in a shared- 
nothing architecture. In other words, compute node 1 
doesn't share CPU, memory, or storage with compute 
node 2. 

Figure 1 shows the architecture for a PDW data 
rack. The smallest PDW will take up two full racks of 
space in a data center, and you can add storage and 
compute capacity to PDW one data rack at a time. 
A data rack contains 8 to 10 compute servers from 
vendors such as Bull, Dell, HP, and IBM, and Fibre 
Channel storage arrays from vendors such as EMC’, 
HP, and IBM. The sale of PDW includes precon- 
figured and pretested software and hardware that's 
tightly configured to achieve balanced throughput 
and I/O for very large databases. Microsoft and these 
hardware vendors provide full planning, implementa- 
tion, and configuration support for PDW. 

The collection of physical servers and disk storage 
arrays that make up the MPP data warehouse is often 
referred to as an appliance. Although the appliance 
is often thought of as a single box or single database 
server, a typical PDW appliance is comprised of 
dozens of physical servers and disk storage arrays 
all working together, often in parallel and under the 
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orchestration of a single server called the control node. 
The control node accepts client query requests, then 
creates an MPP execution plan that can call upon one 
or more compute nodes to execute different parts of 
the query, often in parallel. The retrieved results are 
sent back to the client as a single result set. 


Taking a Closer Look 

Let's dive deeper into PDW’s architecture in Figure 1. 
As I mentioned previously, PDW has a control node 
that clients connect to in order to query a PDW data- 
base. The control node has an instance of the SQL 
Server 2008 relational engine for storing PDW meta- 
data. It also uses this engine for storing intermediate 
query results in TempDB for some query types. The 
control node can receive the results of intermediate 
query results from multiple compute nodes 
for a single query, store those results in 
SQL Server temporary tables, then merge 
those results into a single result set for final 
delivery to the client. 

The control node is an active/passive cluster server. 
Plus, there’s a spare compute node for redundancy 
and failover capability. 

A PDW data rack contains 8 to 10 compute 
nodes and related storage nodes, depending on the 
hardware vendor. Each compute node is a physical 
server that runs a standalone SQL Server 2008 rela- 
tional engine instance. The storage nodes are Fibre 
Channel-connected storage arrays containing 10 to 
12 disk drives. 

You can add more capacity by adding data racks. 
Depending on disk sizes, a data rack can contain in the 
neighborhood of 30TB to 40TB of useable disk space. 
(These numbers can grow considerably if 750GB or 
larger disk drives are used by the hardware vendor.) 
The useable disk space is all RAID 1 at the hardware 
level and uses SQL Server 2008 page compression. 
So, if your PDW appliance has 10 full data racks, 
you have 300TB to 400TB of useable disk space and 
80 to 100 parallel compute nodes. As of this writing, 
each compute node is a two-socket server with each 
CPU having at least four cores. In our example, that’s 
640 to 800 CPU cores and lots of Fibre Channel 
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disk storage. Tm not sure how many organizations 
currently need that much CPU and storage capacity 
for their enterprise data warehouses. However, in the 
words of my big brother, *It's coming!" 

Besides the control node, PDW has several addi- 
tional nodes: 

Landing zone node. This node is used to run 
dwloader, a key utility for high-speed parallel loading 
of large data files into databases, with minimal 
impact to concurrent queries executing on PDW. With 
dwloader, data from a disk or SQL Server Integration 
Services (SSIS) pipeline can be loaded, in parallel, to all 
compute nodes. A new high-speed destination adapter 
was developed for SSIS. Because the destination 
adapter is an in-memory process, SSIS data doesnt 
have be staged on the landing zone prior to loading. 

Backup node. This node is used for backing up 
user databases, which are physically spread across all 
compute nodes and their related storage nodes. When 
backing up a single user database, each compute node 
backs up, in parallel, its portion of the user database. 
To perform the backups, the user databases leverage 
the standard SQL Server 2008 backup functionality 
that's provided by the SQL Server 2008 relational 
engine on each compute node. 

Management node. This node runs the Windows 
Active Directory (AD) domain controller (DC) for 
the appliance. It's also used to deploy patches to all 
nodes in the appliance and hold images in case a node 
needs reimaging. 


Understanding the Table Types 

PDW has two primary types of tables: replicated and 
distributed. Replicated tables exist on every compute 
node. This type of table is most often used for dimen- 
sion tables. Dimension tables are often small, so 
keeping a copy of them on each compute node often 


improves queries' join performance because the data 
doesn't have to be shuffled between compute nodes to 
handle certain types of parallel queries or dimension- 
only queries. Very large dimension tables might be 
candidates for distributed tables. 

Distributed tables are typically used for large fact 
or transaction tables that contain billions or even 
trillions of rows. PDW automatically creates distribu- 
tions for a distributed table. Distributions are separate 
physical tables at the SQL Server instance level on a 
compute node. Metadata on the control node keeps 
track of the mapping of a single distributed table 
and all its constituent distributions on each compute 
node. 

PDW automatically creates eight distributions 
per compute node for a distributed table. (As of this 
writing, the number of distributions isn't configu- 
rable.) Therefore, a PDW appliance with 10 compute 
nodes has 80 total distributions per distributed table. 
Loading 100 billion rows into a distributed table will 
cause that table to be distributed across all 80 dis- 
tributions, providing a suitable distribution key is 
chosen. 

A single-attribute column from a distributed table 
is used as the distribution key. PDW hashes distrib- 
uted rows from a distributed table across the distribu- 
tions as evenly as possible. Choosing the right column 
in a distributed table is a big part of the design process 
and will likely be the topic of many future articles and 
best practices. Suffice it to say, some trial and error 
is inevitable. Fortunately, the dwloader utility can 
quickly reload STB or more of data, which is often 
enough data to test a new design. 


Creating Databases and Tables 

To create databases and tables in PDW, you use code 
that is aligned with ANSI SQL 92 but has elements 
unique to PDW. To create a database, you use the 

CREATE DATABASE command. This command 

has four arguments: 

e AUTOGROW, which you use to specify 
whether you want to allow data and log files to 
automatically grow when needed. 

e REPLICATED SIZE, which you use to specify 
how much space to initially reserve for replicated 
tables on each compute node. 

e DISTRIBUTED SIZE, which you use to specify 
how much space to initially reserve for distributed 
tables. This space is equally divided among all the 
compute nodes. 

• LOG SIZE, which you use to specify how much 
space to initially reserve for the transaction 
log. This space is equally divided among all the 
compute nodes. 


For example, the following CREATE DATA- 
BASE command will create a user database named 
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my_DB that has 16TB of distributed data space, 1TB 
of replicated table space, and 800GB of log file space 
on a PDW appliance with eight compute nodes: 


CREATE DATABASE my_DB 

WITH ( AUTOGROW = ON 
,REPLICATED_SIZE = 1024 GB 
,DISTRIBUTED_SIZE = 16,384 GB 
,LOG_SIZE = 800 GB ) 


A total of 8TB of usable disk space (1024GB x 8 
compute nodes) will be consumed by replicated 
tables because each compute node needs enough disk 
space to contain a copy of each replicated table. Two 
terabytes of usable disk space will be consumed by 
each of the 8 compute nodes (16,384GB / 8 compute 
nodes) for distributed tables. Each compute node will 
also consume 100GB of usable disk space (800GB / 
8 compute nodes) for log files. As a general rule of 
thumb, the overall log-file space for a user database 
should be estimated at two times the size of the largest 
data file being loaded. 

When creating a new user database, you won't 
be able to create file groups. PDW does this auto- 
matically during database creation because file group 
design is tightly configured with the storage to achieve 
overall performance and I/O balance across all com- 
pute nodes. 

After the database is created, you use the CREATE 
TABLE command to create both replicated and dis- 
tributed tables PDW's CREATE TABLE command 
is very similar to a typical SQL Server CREATE 
TABLE command and even includes the ability to 
partition distributed tables as well as replicated tables. 
The most visible difference in this command on PDW 
is the ability to create a table as replicated or to create 
a table as distributed. 

As a general rule of thumb, replicated tables 
should be 1GB or smaller in size. Listing 1 contains 
a sample CREATE TABLE statement that creates 
a replicated table named DimAccount. As you 
can see, the DISTRIBUTION argument is set to 
REPLICATE. 

Generally speaking, distributed tables 
are used for transaction or fact tables that are 
often much larger than 1GB in size. In some 
cases, large dimension tables—for example, a 
500-million row customer account table—is a 
better candidate for a distributed table. Listing 2 
contains code that creates a distributed table 
named FactSales. (You can download the code in 
Listing 1 and Listing 2 by going to www.sqlmag 
.com, entering 125098 in the InstantDoc ID text 
box, clicking Go, then clicking the Download the 
Code Here button.) As I mentioned previously, 
a single-attribute column must be chosen as the 
distribution key so that data loading can be hash 
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LISTING І: Code that Creates 
a Replicated Table 
CREATE TABLE DimAccount 
C 
AccountKey int NOT NULL, 
ParentAccountKey int NULL, 
AccountCodeAlternateKey int NULL, 
ParentAccountCodeAlternateKey int NULL, 
AccountDescription пуагсһаг(50) , 
AccountType nvarchar(58) , 
Operator nvarchar(50) , 
CustomMembers nvarchar (300) , 
ValueType пуагсһаг(50) , 
CustomMemberOptions nvarchar (299) 


) 
WITH (CLUSTERED INDEX(AccountKey) , 
DISTRIBUTION = REPLICATE); 


LISTING 2: Code that Creates 
a Distributed Table 


CREATE TABLE FactSales 
(6 


StoreIDKey int NOT NULL, 
ProductKey int NOT NULL, 

DateKey int NOT NULL, 

SalesQty int NOT NULL, 

SalesAmount decimal(18,2) NOT NULL 


D 
WITH (CLUSTERED INDEX(DateKey), 
DISTRIBUTION - HASH(StoreIDKey)); 


distributed as evenly as possible across all the com- 
pute nodes and their distributions. For a retailer 
with a large point of sale (POS) fact table and a 
large store-inventory fact table, a good candidate 
for the distribution key might be the column that 
contains the store ID. By hash distributing both fact 
tables on the store ID, you might create a fairly even 
distribution of the rows across all compute nodes. 
Also, PDW will co-locate on the same distribution 
(i.e., rows from the POS fact table and rows from 
the store-inventory fact table for the same store ID). 
Co-located data is related, so queries that access 
POS data and related store inventory data should 
perform very well. 

To take full advantage of PDW, designing data- 
bases and tables for the highest-priority queries 1s 
crucial. PDW excels at scanning and joining large 
distributed tables, and often queries against these 
large tables are mission critical. A good database 
design on PDW often takes a lot of trial and 
error. What you learned in the single server data- 
base world isn't always the same in the MPP data 
warehouse world. For instance, clustered indexes 
can work well for large distributed tables, but non- 
clustered indexes can degrade query performance 
in some cases because of the random I/O patterns 
they create on the disk storage. PDW is tuned and 
configured to achieve high rates of sequential I/O 
against large tables. For many queries, sequential 
I/O against a distributed table can be faster than 
using nonclustered indexes, especially under con- 
current workloads. In the MPP data warehouse 
world, this is known as an index-light design. 


> 
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Querying Tables 

After the tables are loaded with data, clients can 
connect to the control node and use SQL statements 
to query PDW tables. For example, the following 
query runs against the FactSales table created 
with Listing 2, leveraging all the parallel compute 
nodes and the clustered index for this distributed 
table: 


SELECT StoreIDKey, SUM(SalesQty) 
FROM dbo.FactSales 


WHERE DateKey >= 20090401 
AND DateKey <= 20090408 
AND ProductKey = 2591 

GROUP BY StoreIDKey 


This query performs exceedingly well against very large 
distributed tables. It’s distribution-compatible and 
aggregation-compatible because each compute node 
can answer its part of the parallel query in its entirety 
without shuffling data among compute nodes or 
merging intermediate result sets on the control node. 


Once the control node receives that 
query, PDW’s MPP engine performs its 
magic by taking the following steps: 

1. It parses the SQL text. 

2. It validates and authorizes all 
objects. 

3. It builds an MPP execution plan. 

4. It runs the MPP execution plan by 
executing SQL SELECT commands in 
parallel on each compute node. 

5. It gathers and merges all the par- 
allel result sets from the compute nodes. 

6. It returns a single result set to the 
client. 
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Data Quality Toolkit 


Plug these data quality transforms 
into your SSIS pipeline 


As you can see, although queries appear 
to be run against a single table, in 
reality, they're run against a multitude 
of tables. 

The MPP engine is responsible for 
a varlety of features and functions 
in PDW. They include appliance con- 
figuration management, authentication, 
authorization, schema management, 
MPP query optimization, MPP query 
execution control, client interaction, 
metadata management, and collection 
of hardware status information. 
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across data sets The Power of PDW 


PDW's power lies in large distributed 
tables and parallel compute nodes that 
scan those distributed tables to answer 
queries. Thus, PDW is well-suited for 
vertical industries (e.g., retail, telecom, 
logistics, hospitality), where large 
amounts of transactional data exist. 
It doesn't take too much of a leap to 
consider PDW well-suited for mission- 
critical applications (e.g., for the military 
or law enforcement), where lives depend 
on such capabilities. I hope you're as 
excited as I am to work on real-world 
and mission-critical solutions with this 
new technology. En 
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Panorama NovaView Suite 


p Software's Panorama NovaView is a 
suite of analytical and reporting software prod- 
ucts driven by the centralized NovaView Server. There 
are two editions of NovaView: Standard Edition and 
Enterprise Edition. All NovaView components are 
installed on the server, except for NovaView Spot- 
light, which is an extension to Microsoft Office, so it's 
installed on client machines. 

NovaView currently ships only in an x86 build. 
The software is highly multi-threaded and has built-in 
support for connection pooling. The key hardware 
requirements are that the number of CPUs be pro- 
portional to the number of concurrent users (with a 
recommended minimum of 4 CPUs and roughly two 
cores per 100 users) and that enough physical RAM 
be present to run all the pnSessionHost.exe processes 
(with a recommended minimum of 4GB). 

On the software side, NovaView requires Windows 
Server 2008 or Windows Server 2003, IIS 6 or higher, 
.NET Framework 2.0 or higher, and Microsoft Visual 
J# 2.0. If you're going to source data from SQL 
Server Analysis Services (SSAS) cubes, you'll need 
a separate server installation with SSAS. NovaView 
can also work with many other mainstream enterprise 
data sources. 

NovaView Server provides infrastructure services 
for the entire NovaView suite of client tools. It sup- 
ports a wide variety of data sources, including the 
SQL Server relational engine, SSAS, Oracle, SAP, 
flat files, and web services. NovaView Server is a 
highly scalable piece of software that can support 
up to thousands of users and terabytes of data, 
according to reports from Panorama Software's 
established customers. 

The NovaView Dashboards, NovaView Visuals, 
and NovaView GIS Framework components pro- 
vide the next layer of business intelligence (BI) 
delivery, including basic analytics and other visual- 
izations. NovaView Dashboards provides a mature 
modern-day dashboarding product that lets you 
create complex views from NovaView Server. Both 
Key Performance Indicators (KPIs) and charts are 
easily created and combined to form enterprise 
dashboards. 

NovaView Visuals provides advanced informa- 
tion visualization options for NovaView Dashboards. 
For example, capabilities equivalent to ProClarity's 
Decomposition Tree are included as part of NovaView 
Visuals. With one click, you can map out analytics that 
come from NovaView Analytics. 

NovaView SharedViews represents a joint ven- 
ture between Panorama Software and Google. This 
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component lets business users publish their insights 
to the cloud for further collaboration. It requires a 
Google Docs account, and you can use it to supply 
information to non-traditional parties, such as 
company suppliers and partners. It's easy to publish 
reports, and the reports look the same no matter 
which edition of NovaView Analytics you're using. 
The product has its own security model to ensure 
your company's information is available to only 
those you trust. 

One of the most interesting features of NovaView 
is the company's upcoming support for Microsoft 
PowerPivot. PowerPivot will be just another data 
source from the perspective of the NovaView suite. 
You might wonder why anyone would want to use 
another BI tool with PowerPivot. PowerPivot is an 
outstanding self-service BI product, but it's also 
a version-one product. There are a few areas of 
PowerPivot that Microsoft has left to improve upon 
that NovaView will provide, including complex 
hierarchies, data security, and additional data visual- 
ization options. 

NovaView offers end-to-end BI delivery, and it 
does it quite well. Panorama has clearly used its deep 
knowledge of OLAP and MDX to produce some of 
the very best delivery options on the market today. 
Businesses that are looking to extend their existing 
Microsoft data warehouse and BI solutions or make 
PowerPivot enterprise-ready should strongly con- 
sider NovaView. Given the sheer breadth and depth 
of the suite, it’s obvious that not all customers will 
need all of its components. Small-to-midsized busi- 
nesses might find NovaView’s relatively high cost 
prohibitive. SOL} 
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(dcomingore @ bivoyage.com) is a principal 
architect with Bl Voyage, a Microsoft Partner 
that specializes in business intelligence services 
and solutions. He’s a SQL Server MVP and holds 
several Microsoft certifications. 


Pros: All user-facing components are browser based; supports both 


OLAP and non-OLAP data sources; components are tightly bound; supports 


core needs of both business and IT users 


Cons: High price; additional server components required; neither edition is as 
graphically rich or fluid as alternatives such as Tableau Software’s client 


Rating: Wwe 


Price: Server licenses range from $12,000 to $24,000, depending on con- 
figuration; client licenses range from $299 to $14,000 


Recommendation: If you're in the market for a third-party toolset to add 
functionality to Microsoft’s BI tools, your search is over. But if you only need a 
few of the suite’s functions, its cost could be prohibitive. 
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PowerPivot vs. Tableau 


This article is a summarized version of Derek 
Comingore's original blog. To read the full article, go to 
sqlmag. com/golSQLServer BI. 


Microsoft PowerPivot 2010 
PowerPivot is composed of Desktop (PowerPivot for 
Excel) and Server (PowerPivot for SharePoint) com- 
ponents. The client experience is embedded directly 
within Microsoft Excel, so its authoring experience is 
Excel. Users can create custom measures, calculated 
fields, subtotals, grand totals, and percentages. It uses 
a language called Data Analysis eXpressions (DAX) 
to create custom measures and calculated fields. 
PowerPivot ships in x64 and leverages a column- 
oriented in-memory data store, so you can work with 
massive volumes of data efficiently. DAX provides an 
extensive expression language to build custom mea- 
sures and fields with. PowerPivot for Excel supports 
practically any data source available. PowerPivot for 
SharePoint offers a wealth of features from Share- 
Point. On the downside, PowerPivot for Excel can be 
confusing and DAX is very complex. PowerPivot for 
Excel does not support parent/child hierarchies either. 


Tableau 5.1 

Tableau Desktop provides an easy-to-use, drag-and- 
drop interface letting anyone create pivot tables and 
data visualizations. Dashboard capabilities are also 
available in Tableau Desktop by combining multiple 
worksheets into a single display. 

Tableau’s strength lies in the product's visualiza- 
tion and easy-to-use authoring capabilities. The for- 
mula language is easy enough to use and build custom 
measures and calculated fields with. Both Tableau 
Server and Desktop installations are extremely easy 
to perform. However, working with massive volumes 
of data can be painful. Tableau's formula language 
is impressive in its simplicity but isn't as extensive as 
PowerPivot’s DAX. Tableau Server is a good server 
product but cannot offer the wealth of features found 
in SharePoint Server. 

If your company is looking for an IT-oriented 
product that is geared for integration with corporate 
BI solutions, PowerPivot is a no brainer. If your 
company is looking for a business-user centric 
platform with little IT usage or corporate BI 
integration, Tableau should be your choice. [SQL | 
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меня SQL Server 2008 and 64-bit support 


y” Rollback or reconstruct operations 


The Ultimate Log Reading, Auditing Tool 


е Integrates with Database backups for improved audit trails 


v/ Analyze historical activity of each transaction 


v Export transactions to XML, BULK SQL, CSV, etc. 
y” Read and analyze SQL Server transaction logs 
Y Access to programmable API for automation 


Y Command line interface included as standard feature 


For more information 
or to download a free trial version go to: 


software 


Zz ApexSQL = wwwaapexsql.com 


or phone 866-665-5500 


Derek Comingore 


(dcomingore @ bivoyage.com) is a principal 
architect with B.I. Voyage, a Microsoft Partner 
that specializes in business intelligence services 
and solutions. He's a SQL Server MVP and 
holds several Microsoft certifications. 
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BUSINESS INTELLIGENCE 
Lyzasoft Enhances Data Collaboration Tool 
Lyzasoft has announced Lyza 2.0. This version lets groups within an enterprise synthesize data from many 
sources, visually analyze the data, and compare their findings among the workgroup. New features include 
micro-tiling, improved user format controls, ad hoc visual data drilling, n-dimensional charting, advanced 
sorting controls, and a range of functions for adding custom fields to charts. Lyza 2.0 also introduces new 
collaboration features, letting users interact with content in the form of blogs, charts, tables, dashboards, 
and collections. Lyza costs $400 for a one-year subscription and $2,000 per user for a perpetual license. To 
learn more, visit www.lyzasoft.com. 


DATABASE ADMINISTRATION 

Attunity Updates Change Data Capture Suite 

Attunity announced a new release of its change data capture and operational data replication software, 
Attunity, with support for SQL Server 2008 R2. Attunity now tracks log-based changes across all versions of 
SQL Server, supports data replication into heterogeneous target databases, and fully integrates with SQL Server 
Integration Services (SSIS) and Business Intelligence Development Studio (BIDS). To learn more or download 


Got a great 
new product? 
Send announce- 
ments to products @ 
sqlmag.com. 
— Brian Reinholz, 
production editor 


a free trial, visit www.attunity.com. 
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Easily Design PDF Flow Charts 

Aivosto has released Visustin 6.0, a flow chart generator that converts 
T-SQL code to flow charts. The latest version can create PDF flow 
charts from 36 programming languages—the new version adds sup- 
port for JCL, Matlab, PL/I, Rexx and SAS code. Visustin reads source 
code and visualizes each function as a flow chart, letting it see how the 
functions operate. With the software, you can easily view two charts side 
by side, making for easy comparisons. The standard edition costs $249 
and the pro edition costs $499. To learn more, visit www.aivosto.com. 
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double average(double sum, int count) ( 
/* Calculate average from sum and count */ 


"double average; 


if (count == 1) 
return sum; 
else if (count > 0) 
return sum / count; 
else 
return 0; /* Invalid value of Count */ 


double 
sum, int count) 


DATABASE ADMINISTRATION 


Embarcadero Launches Multi-platform DBArtisan XE 

Embarcadero Technologies introduced DBArtisan XE, a solution that lets DBAs maximize the performance 
of their databases regardless of type. DBArtisan XE helps database administrators manage and optimize the 
schema, security, performance, and availability of all their databases to diagnose and resolve database issues. 
DBArtisan XE is also the first Embarcadero product to include Embarcadero ToolCloud as a standard feature. 
ToolCloud provides centralized licensing and provisioning, plus on-demand tool deployment, to improve tool 
manageability for IT organizations with multiple DBArtisan users. DBArtisan starts at $1,100 for five server 
connections. To learn more or download a free version, visit www.embarcadero.com. 
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DBMoto 7 Adds Multi-server Synchronization 
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Data Types 


D: with large object (LOB) data is one of 
the challenges of managing SQL Server installa- 
tions. LOBs are usually composed of pictures but they 
can contain other data as well. Typically these images 
are used to display product images or other graphical 
media on a website, and they can be quite large. 

SQL Server has many data types that can be used 
for different types of LOB storage, but picking the right 
one for LOB storage can be difficult —if you even want 
to store the LOBs in the database at all. Many DBAs 
prefer to keep LOBs out of the database. 

The basic rule of thumb is that LOBs smaller than 
256KB perform best stored inside a database while LOBs 
larger than 1MB perform best outside the database. 

Storing LOBs outside the database offers perfor- 
mance advantages, but it also jeopardizes data because 
there's no built-in mechanism to ensure data integrity. 
To help you find the best way to store your LOB data, 
it's necessary to understand the differences between 
SQL Server's LOB data types. 


TEXT 

Used for non-Unicode variable-length text data, this 
data type can't be used for binary data. The TEXT data 
type supports data up to 2^31-1(2,147,483,647) —2GB. 
The TEXT data type is deprecated, but it's still present 
in SQL Server 2008 R2. 


NTEXT 

Used for Unicode variable-length text data, like 
the TEXT data type, this data type doesn't support 
binary data. The NTEXT data type supports data 
up to 2^30-1(1,073,741,823)—1GB. The NTEXT 
data type is deprecated but is still present in SQL 
Server 2008 R2. 


IMAGE 

Used for variable length binary data, the IMAGE 
data type is the traditional LOB storage type for 
SQL Server, and you can store both text and binary 
data in it. The IMAGE data type supports data up 
to 2431-1(2,147,483,647)—2GB. The IMAGE data 


"338 SQL Server 2008 LOB 


type is deprecated, but it's still present in SQL Server 
2008 R2. 


VARCHAR(MAX) 

Used for non-Unicode variable length text data, the 
VARCHAR(MAX) data type supports data up to 
2^31—1(2,147,483,647) —2GB. The VARCHAR(MAX) 
data type was added with SQL Server 2005 and is 
current. 


NVARCHAR(MAX) 
Used for Unicode variable length text data, the 
NVARCHAR(MAX) data type supports data up to 
2^30-1(1,073,741,823)—1GB. The NVARCHAR 
(MAX) data type was added with SQL Server 2005 
and is current. 


FILESTREAM 

The FILESTREAM data type combines the perfor- 
mance of accessing LOBs directly from the NTFS 
file system with the referential integrity and direct 
access through the SQL Server relational database 
engine. It can be used for both binary and text 
data, and it supports files up to the size of the disk 
volume. 

The FILESTREAM data type is enabled using a 
combination of SQL Server and database configura- 
tion and the VARBINARY(MAX) data type. It was 
added with SQL Server 2008 and is current. 

You can find more information about this data 
type in “Using SQL Server 2008 FILESTREAM 
Storage" at www.sqlmag.com, InstantDoc ID 101388, 
and “Using SQL Server 2008’s FILESTREAM Data 
Type” at www.sqlmag.com, InstantDoc ID 102068. 


XML 

Used for storing XML-based data, the XML data 

type was enhanced in SQL Server 2008. It sup- 

ports data up to 2^31-1(2,147,438,647) — 2GB. It 

was added with the SQL Server 2005 release and is 

current. 500 
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WHAT'S SLOWING YOU DOWN? 


IT MIGHT BE MORE OBVIOUS THAN YOU THINK 
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SQL Sentry products are made by DBAs, for DBAs. They provide unparalleled 
insight, awareness and control over your SQL Server environment. With 
features like real-time and historical performance analysis and visual 
schedule management, SQL Sentry is the ultimate performance solution. 
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